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Preface 



SAC 2003 was the tenth in a series of annual workshops on Selected Areas in 
Cryptography. This marked the third time that the workshop had been held at 
Carleton University in Ottawa with previous workshops being held there in 1995 
and 1997. The intent of the SAC workshops is to provide a relaxed atmosphere in 
which researchers in cryptography can present and discuss new work on selected 
areas of current interest. 

The themes for the SAC 2003 workshop were: 

— design and analysis of symmetric key cryptosystems, 

— primitives for symmetric key cryptography, including block and stream ci- 
phers, hash functions, and MACs, 

— efficient implementation of cryptographic systems in public and symmetric 
key cryptography, 

— cryptographic solutions for Web services security, 

— cryptography and security of trusted and distributed systems. 

A total of 85 papers were submitted to SAC 2003, two of which were subse- 
quently withdrawn. After a review process that had all papers reviewed by at 
least three referees, 25 papers were accepted for presentation at the workshop. 
We would like to thank all of the authors who submitted papers, whether or 
not those papers were accepted, for submitting their high-quality work to this 
workshop. 

As well, we were fortunate to have the following two invited speakers at SAC 
2003: 

— Nicolas Courtois (Schlumberger Smart Cards) 

Algebraic attacks and design of block ciphers, stream ciphers, and multivari- 
ate public key schemes 

— Virgil D. Gligor (University of Maryland) 

Cryptolight: Perspective and Status 

SAC 2003 was memorable for all those involved, not only because of the 
quality of the technical program, but also because of the massive power blackout 
that occurred. On August 14, 2003 much of the eastern part of the United States, 
and most of the province of Ontario were plunged into darkness. The city of 
Ottawa was without power from about 4:00 pm on August 14 through most of 
the day on August 15. Despite the lack of power, the workshop carried on in an 
“unplugged” format with all remaining talks presented in a makeshift lecture 
hall using chalk and blackboards. The staff of the Tour and Conference Centre 
at Carleton University deserve special recognition for helping the chairs make 
alternate arrangements to deal with the blackout. We would also like to thank 
all SAC attendees and, in particular, the presenters who persevered and made 
SAC 2003 a success, despite the trying circumstances. 
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Preface 



We appreciate the hard work of the SAC 2003 Program Committee. We 
are also very grateful to the many others who participated in the review pro- 
cess: Gildas Avoine, Florent Bersani, Alex Biryukov, Eric Brier, Jean-Sebastien 
Coron, Joan Daemen, Christophe De Canniere, Jean-Franqois Dhem, Zhi (Judy) 
Fu, Virgil Gligor, Florian Hess, Don Johnson, Pascal Junod, Hans-Joachim 
Knobloch, Joe Lano, John Malone-Lee, Tom Messerges, Jean Monnerat, Svetla 
Nikova, Dan Page, Pascal Paillier, Matthew Parker, Holger Petersen, Michael 
Quisquater, Havard Raddum, Christophe Tymen, Frederik Vercauteren, and 
Michael Wiener. We apologize for any unintended errors or omissions in this 
list. 

We are also appreciative of the financial support provided by Carleton Uni- 
versity, Cloakware Corporation, Entrust, Inc., Mitsubishi Electric, and Queen’s 
University Kingston. 

Special thanks are due to Sandy Dare for providing administrative assistance 
and to the local arrangements committee consisting of Mike Just, Tao Wan, and 
Dave Whyte for their help. 

On behalf of all those involved in organizing the workshop, we thank all the 
workshop participants for making SAC 2003 a success! 
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Low Cost Security: Explicit Formulae 
for Genus-4 Hyperelliptic Curves 



Jan Pelzl, Thomas Wollinger, and Christof Paar 



Department of Electrical Engineering and Information Sciences 
Communication Security Group (COSY) 
Ruhr-Universitat Bochum, Germany 
{pelzl .wollinger , cpaar}@crypto . rub . de 



Abstract. It is widely believed that genus four hyperelliptic curve cryp- 
tosystems (HECC) are not attractive for practical applications because 
of their complexity compared to systems based on lower genera, espe- 
cially elliptic curves. Our contribution shows that for low cost security 
applications genus-4 hyperelliptic curves (HEC) can outperform genus-2 
HEC and that we can achieve a performance similar to genus-3 HEC. 
Furthermore our implementation results show that a genus-4 HECC is 
an alternative cryptosystem to systems based on elliptic curves. 

In the work at hand we present for the Erst time explicit formulae for 
genus-4 HEC, resulting in a 60% speed-up compared to the best pub- 
lished results. In addition we implemented genus-4 HECC on a Pentium4 
and an ARM microprocessor. Our implementations on the ARM show 
that for genus four HECC are only a factor of 1.66 slower than genus-2 
curves considering group order s» 2 190 . For the same group order ECC 
and genus-3 HECC are about a factor of 2 faster than genus-4 curves 
on the ARM. The two most surprising results are: 1) for low cost secu- 
rity application, namely considering an underlying group of order 2 128 , 

HECC with genus 4 outperform genus-2 curves by a factor of 1.46 and 
has similar performance to genus-3 curves on the ARM and 2) when 
compared to genus-2 and genus-3, genus-4 HECC are better suited to 
embedded microprocessors than to general purpose processors. 

Keywords: Hyperelliptic curves, genus four, explicit formulae, efficient 
implementation, low cost security, embedded application, comparison 
HECC vs. ECC 

1 Introduction 

It is widely recognized that data security will play a central role in the design 
of future IT systems. One of the major tools to provide information security is 
public-key cryptography. Additionally, one notices that more and more IT ap- 
plications are realized as embedded systems. In fact, 98% of all microprocessors 
sold today are embedded in household appliances, vehicles, and machines on fac- 
tory floors [9, 3], whereas only 2% are used in PCs and workstations. Embedded 
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processors have a 100 - 1000 times lower computational power than conven- 
tional PCs. In addition to many other challenges, the integration of security and 
privacy in the existing and new embedded applications will be a major one. 

Since the invention of public-key (PK) cryptography in 1976, three dif- 
ferent variants of PK cryptosystems of practical relevance have been intro- 
duced, namely cryptosystems based on the difficulty of integer factorization 
(e.g. RSA [36]), solving the discrete logarithm problem in finite fields (e.g. Diffie- 
Hellman [6]), and the discrete logarithm problem (DLP) in the group of points 
of an elliptic curve (EC) over a finite field [29, 17]. Hyperelliptic curve cryptosys- 
tems (HECC) are a generalization of elliptic curve cryptosystems (ECC) that 
were suggested in 1988 for cryptographic applications [18]. 

Considering the implementation aspects of the three public-key variants, one 
notices that a major difference is the bit-length of the operands. It is widely 
accepted that for commercial applications one needs 1024-bit operands for RSA 
or Diffie-Hellman. In the case of ECC or HECC applications, a group order of 
size ~ 2 160 is believed to be sufficient for moderate long-term security. In this 
contribution we consider genus-4 HECC over F g and therefore we will need at 
least 4 • log 2 q ~ 2 160 . In particular, for these curves, we will need a field F g with 
|F g | ss 2 40 , i.e. , 40-bit long operands. However, in many low cost and embedded 
applications lower security margins are adequate. In practice, if a group order 
of 2 128 is sufficient, the operations can be performed with an operand length of 
32-bit. Thus, the underlying field operations can be implemented very efficiently 
if working with 32-bit microprocessors (e.g. ARM). It is important to point out 
that the small field sizes and the resulting short operand size of HECC compared 
to other cryptosystems makes HECC specially promising for the use in embedded 
environments. We discuss the security of such curves in Section 4.2. 

Our Contributions 

The work at hand presents for the first time explicit formulae for genus-4 curves. 
Genus-4 HECC did not draw a lot of attention in the past because they seem 
to be far less efficient than genus-2 HECC, genus-3 HECC, and ECC. Our con- 
tribution is a major step in accelerating this kind of cryptosystem and contrary 
to common belief we were able to develop explicit formulae that perform the 
scalar multiplication 72% and 60% faster than previous work by Cantor [5] and 
Nagao [32], respectively. 

Genus-4 HECC are well suited for the implementation of public-key cryp- 
tosystems in constrained environments because the underlying arithmetic is per- 
formed with relatively small operand bit-lengths. In this contribution, we present 
our implementation of this cryptosystem on an ARM and a Pentium micropro- 
cessor. We were able to perform a 160bit scalar multiplication in 172 msec on 
the ARM@80MHz and in 6.9 msec on the Pentium4@1.8GHz. In addition, our 
implementations show, that genus-4 HECC are only a factor of 1.66 and 2.08 
slower than genus-2 and genus-3 curves considering group order of ss 2 1 , re- 
spectively. Compared to ECC, the genus-4 HECC are a factor of 2 slower for the 
same group order . 



Low Cost Security: Explicit Formulae for Genus-4 Hyperclliptic Curves 



3 



Genus-4 HEC are well suited, especially for cryptographic applications with 
short term security. Performing arithmetic with 32-bit operands only, genus- 
4 HECC allow for a security comparable to of 128-bit ECC. We implemented 
genus-4 HECC with underlying field arithmetic for 32-bit. In this case one is 
able to perform arithmetic with only one word. Contrary to the general case, 
the implementation of genus-4 curves in groups of order ss 2 128 outperform 
genus-2 curves by a factor of about 1.5. Furthermore, our implementation shows 
that, HECC with genus three and four have similar performance considering the 
group order « 2 128 . 

The remainder of the paper is organized as follows. Section 2 summarizes 
contributions dealing with previous implementations and efficient formulae of 
genus-4 HECC. Section 3 gives a brief overview of the mathematical back- 
ground related to HECC and Section 4 considers the security of the implemented 
HECCs. Sections 5 and 6 present our new explicit formulae for genus-4 curves 
and methodology used for our implementation. Finally, we end this contribution 
with a discussion of our results and some conclusions. 

2 Previous Work 

We will first summarize previous improvements on genus-4 HEC group opera- 
tions and second introduce implementations published in earlier contributions. 



Improvements to HECC Group Operations of Genus-4 HECC Can- 
tor [5] presented algorithms to perform the group operations on HEC in 1987. 
In recent years, there has been extensive research being performed to speed up 
the group operations on genus two HECC [32, 16, 27, 30] [43, 23, 24, 25] and 
genus three [32, 22, 34]. 

Only Nagao [32] tried to improve Cantor’s algorithm for higher genera. 

Nagao evaluated the computational cost of the group operations by applying 
the stated improvements for genus 2 < g < 10. The most efficient group addition 
for genus-4 curves needs 21 + 289M/S or 31 + 286M/S (depending on the cost of 
the field inversion compared to multiplications, one or the other is more efficient). 
/ refers to field inversion, M to field multiplication, S to field squaring, and M/S 
to field multiplications or squarings, since squarings are assumed to be of the 
same complexity as multiplications in these publications. For the computation 
of a group doubling in genus-4 curves one has to perform 21 + 268M/S or 31 + 
260M/S. Notice that the ideas proposed by [32] are used to improve polynomial 
arithmetic. 



Genus-4 HECC Implementations Since HECC were proposed, there have 
been several software implementations on general purpose machines [21, 38] [42, 
39, 27, 30, 22, 23] and publications dealing with hardware implementations of 
HECC [46, 4]. Only very recently work dealing with the implementation of HECC 
on embedded systems was published in [33, 34]. 
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Table 1 . Execution times of recent HEC implementations in software 



reference 


processor 


genus 


field 


t scalarmult . 


[21] 


Pentium@ 100MHz 


4 


F 2 31 


1100 


[38] 


Alpha@467MHz 


4 


F 2 41 


96.6 


Pentium-II@300MHz 


4 


F 2 41 


10900 


[39] 


Alpha21164A@600MHz 


4 


F 2 41 


43 



The results of previous genus-4 HECC software implementations are summa- 
rized in Table 1. All implementations use Cantor’s algorithm with polynomial 
arithmetic. We remark that the contribution at hand is the first genus-4 HECC 
implementation based on explicit formulae. 

3 Mathematical Background 

The mathematical background described in this section is limited to the material 
that is required in our contribution. The interested reader is referred to [19, 28, 
20] for more details. 

3.1 HECC and the Jacobian 

Let F be a finite field, and let F be the algebraic closure of F. A hyperelliptic 
curve C of genus g > 1 over F is the set of solutions (u, v) gFxF to the equation 

C : v 2 + h(u)v = f(u) 

The polynomial h{u ) £ F[u] is of degree at most g and f(u) £ F[u] is a monic 
polynomial of degree 2g + 1. For odd characteristic it suffices to let h(u) = 0 and 
to have f(u) square free. 

A divisor D = ]T] rriiPi, rrii £ Z, is a finite formal sum of F-points. The set 
of divisors of degree zero will be denoted by D°. Every rational function on the 
curve gives rise to a divisor of degree zero and is called principal. The the set of 
all principal divisors is denoted by P. We can define the Jacobian of C over F, 
denoted by JcOF) as the quotient group D°/P. 

In [5] it is shown that the divisors of the Jacobian can be represented as 
a pair of polynomials a(u) and b(u) with deg b(u) < deg a(u) < g , with a(it) 
dividing b(u) 2 + h(u)b{u) — f(u) and where the coefficients of a(u) and b(u) are 
elements of F [31]. In the remainder of this paper, a divisor D represented by 
polynomials will be denoted by div(a , b). 

3.2 Group Operations in the Jacobian 

This section gives a brief description of the algorithms used for adding and dou- 
bling divisors on J'c(F). Algorithm 1 describes the group addition. Doubling 
a divisor is easier than general addition and therefore, Steps 1,2, and 3 of Algo- 
rithm 1 can be simplified as follows: 
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Algorithm 1 Group addition 

Require: D\ = div(oi,fei), D 2 = div(a2,fc 2 ) 

Ensure: D = div(a3, 63) = D 1 + D2 

1: d = gcd(ai, a 2 , bi + b 2 + h) = siai + s 2 a 2 + ss(bi + b 2 + h) 
2: do = aia2 /d 2 

3 : b' Q = [siaife 2 + s 2 a 2 bi + S3(bib 2 + /)]d _1 (modao) 

4 : k = 0 

5 : while deg a*. > g do 
6 : k = k + 1 

n _/ f—b'k-\h—(b'k—x) 2 

< • a k — 7 

a k-i 

8: 6J. = (— /i — 6j._i) mod aj. 

9 : end while 

10 : Output (03 = c 4 , 63 = 6' fe ) 



1: d = gcd(a, 2& + h) = s\a + s^{2b + h) 

2: a' 0 = a 2 /d 2 

3: &Q = [siab + Ssib 2 + /)]d _1 (modag) 

3.3 Harley’s Algorithm 

The algorithms given in the previous section for the group operations in the 
Jacobian of HEC require the use of polynomial arithmetic for polynomials with 
coefficients in the definition field. An alternative approach for genus-2 curves 
was proposed by Harley in [15]. Harley computes the necessary coefficients from 
the steps of Cantor’s algorithm directly in the definition field without the use of 
polynomial arithmetic, resulting in a faster execution time. 

In [15], the authors found out that it is essential to know the weight of the 
input divisor to determine explicit formulae. For each case, implementations of 
different explicit formulae are required. However, for practical purposes it is 
sufficient to only consider the most frequent cases 1 which occur with probability 
of 1 — 0(l/q), where q is the group order. General formulae for the most frequent 
case for genus-2 curves and arbitrary characteristic were presented in [23] . 

Algorithm 2 and 3 describe the most frequent case of group addition and 
doubling for genus-4 curves, respectively. 

In Section 5 we develop for the first time explicit formulae of Cantor’s Algo- 
rithm for genus-4 curves. 

4 Security of the Implemented HECC 

4.1 Security of HECC with High Genera 

The DLP on 1(F) can be stated as follows: given two divisors Di,D 2 £ 1(F), 
determine the smallest integer m such that D 2 = mD\, if such an m exists. 

For addition the inputs are two co-prime polynomials and for doubling the input is 
a square free polynomial. 
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Algorithm 2 Most Frequent Case for Group Addition (g=4) 

Require: D\ = div(oi,&i), D2 = div(a2,&2) 

Ensure: D3 = div(a3, 63) = D\ + D2 

1: k = (exact division) 

2: s = ^=*1. mod a 2 

a l 

3: z = sa 1 

4 : a = k ~<*+h+ 2b i) ( exac t division) 

5 : a 1 = a made monic 
6: b' = —{h + z + 61) mod a' 

7 : <23 = f ~ b b 7 (b (exact division) 

8: 63 = — (b' + h ) mod 03 



Algorithm 3 Most Frequent Case for Group Doubling (g=4) 

Require: D\ = div(oi, &i) 

Ensure: D2 = div(a2, 62) = 2Di 

1: k = b (exact division) 

2: s = 7TF26 mod a 

3 : a 1 = s 2 + fc - s( ^+ 2b) (exact division) 

4 : a " = a' made monic 
5 : b" = —(h + sa + b) mod a' 

6: a2 ~ ^~ b (exact division) 

7 : &2 = —(6” + /i) mod <22 



The Pollard rho method and its variants [35, 45, 13] solve the DLP with 
complexity 0(\fn) in generic groups of order n. In [11, 37] attacks against spe- 
cial cases of HECC were discovered with complexity smaller than 0(y / n). An 
algorithm to compute the DL in subexponential time for sufficiently large 
genera and its variants were published in [1, 10, 7, 14, 8]. The complexity 
of this algorithm is only better than the Pollard’s rho method for g > 3. In [14] 
it is shown that index-calculus algorithms in the Jacobian of HEC have a higher 
complexity than the Pollard rho method for curves of genus greater than 4. Re- 
cent results by Theriault [44] show progress in attacks against HEC of genus 4. 
An asymptotic running time of 0(n 14,/9 ) compared to 0{n 2 ) for Pollard’s rho 
can be achieved. However, an actual attack on curves of group order of « 2 128 
is currently infeasible due to high storage usage. Furthermore, the values given 
in [44] are asymptotical, thus, large constant factors might influence the running 
times for small group orders. 

4.2 Security of 128-bit HECC 

In [26], Lenstra and Verheul argue, that for commercial security in the year 
2003, 136-bit ECC should be considered. Furthermore, the authors state that 
ECC using 136-bit keys are as secure as 1068-bit keys for RSA or DSS. This 



Low Cost Security: Explicit Formulae for Genus-4 Hyperclliptic Curves 



7 



notion of commercial security is based on the hypothesis that a 56-bit block 
cipher offered adequate security in 1982. 

It is also worth to point out that the factorization of the 512-bit RSA chal- 
lenge took only about 2% of the time required to break the ECC2K-108 challenge 
(or to break DES). This implies that ECC or HECC in groups of order 2 128 offer 
far more security than a 512-bit RSA system. Nevertheless, RSA with a 512-bit 
key is still in use, for example in fielded smart card applications. 

5 First Explicit Formulae for Genus-4 F1ECC 

For the derivation of the explicit formulae, all polynomial calculations have to 
be mapped onto field operations. 

The work at hand is the first approach using explicit formulae to optimize 
the group operations for genus-4 HEC. Table 7 presents the explicit formulae for 
a group addition and Table 8 those for a group doubling. The complexity of the 
formulae shown in the tables is based on the assumption that the coefficients hi 
of h(x ) = h±x A + h 8 x 3 + h 2 X 2 + h\x + ho are from {0, 1}. In addition, we can 
set fs to zero by substituting x' = x + Thus, all multiplications with the 
coefficients hi for i g {0,1, 2, 3, 4} and f 8 are neglected in the total operation 
count. A comparison of the computational complexity of our approach with the 
results of previously done work on genus-4 curves is illustrated in Table 2. 

An extensive description on the methodology to reduce the complexity of the 
group operation can be found in [22, 23, 33]. 

A detailed analysis of the explicit formulae give rise to certain types of curves 
with good properties, i.e. optimum performance regarding the number of re- 
quired field operations for the execution of the group operations. As a result of 
this analysis, curves of the form y 2 + y = f(x) over extension fields of character- 
istic two turn out to be ideal. Unfortunately this kind of curve is supersingular 
and therefore not suited for use in cryptography [14, 12, 40]. Thus, we pro- 
pose to use HEC of the form y 2 + xy = f(x) over F 2 «, which seem to be the 
best choice without any security limitations. With these curves, we can save 12 
multiplications in the case of the group addition and 118 multiplications the for 
group doubling. 

The following comparison is based on the assumption that a scalar multipli- 
cation with an n-bit scalar is realized by the sliding window method. Hence, the 
approximated cost of the scalar multiplication is n ■ doublings + 0.2 • n • additions 
for a 4-bit window size [2] . For arbitrary curves over fields of general character- 
istic, we achieve a 24% improvement 2 compared to the results presented in [32]. 
In the case of HEC over F 2 ** with h(x) = x , we can reach an improvement up 
to 60% compared 2 to the best known formulae for genus-4 curves. When com- 
paring our result to the original formulae presented by Cantor, the speed up is 
72% and 47% for curves with h(x) = x and general curves 2 , respectively. 

2 We assumed, that the computation time for one inversion is approximately the same 
as for 8 multiplications. This value is fortified by several implementations targeting 
HECC on 32-bit processors, see Appendix A. 
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Table 2. Complexity comparison of the group operations on HEC of genus four 





field 

characteristic 


curve 

properties 


c 

addition 


DSt 

doubling 


Cantor [3! 


general 




6 7 + 386 M/S 


67 + 359 M/S 


Nagao [•'!: 


odd 


h(x) = 0, fi E F 2 


21 + 28 9M/S 


21 + 268 M/S 


This work (Tables 7, 8) 


general 

two 


hi e F 2 , /s = 0 
h{x) = x, fs = 0 


21 + 160 M + 4 5 
21 + 148M + 65 


27 + 193 M + 165 
27 + 75 M + 14S 



6 HECC Implementation 

We implemented the derived explicit formulae for genus-4 HECC from Section 5 
on a Pentium4 and on an ARM microprocessor. This contribution is the first to 
implement explicit formulae for genus-4 HECC and is also the first to run this 
cryptosystem on an embedded processor as the ARM7TDMI. 

6.1 Methodology 

This subsection provides a short overview of the tools and the methodology used 
which finally lead to a successful implementation of the group operations. 

The methodology is as follows: 

1. Test the explicit formulae: NTL-based [41] implementation of Cantor’s algo- 
rithm and of the new explicit formulae. 

2. Speeding up the implementation: We developed our own library for the re- 
quired field and group operations. 

3. Testing the code on the Pentium. 

4. Portation to the ARM: The code was loaded into the ARM7TDMI@80MHz 
(ARMulator) . 

5. Running and testing genus-4 HECC on the ARM7TDMI (ARMulator). 

6. Detailed timing analysis for different field sizes and curves. 

6.2 The ARM Processor 

As primary target platform, we choose the popular embedded 32-bit ARM mi- 
croprocessor which can be found in mobile communication devices and consumer 
electronics. ARM stands for Advanced RISC Machine and shows a typical RISC 
architecture with additional features like flag-depending instruction execution. 
ARM 7 is based on a von Neuman architecture. It is well suited for small hand 
held devices such as PDAs. As a reference and for testing we implemented all 
cryptosystems as well on a Pentium4. 

7 Results 

In this section we present our implementation results for the ARM and the 
Pentium. In the first part we discuss our results for standard commercial secu- 
rity application, whereas the second part concentrates on low cost security with 
genus-4 curves. 
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7.1 Pentium and ARM Timings 

Tables 3 and 4, present the execution times for genus-4 HECC operations on 
the ARM processor and the Pentium, respectively. We provide timings for the 
group addition, group doubling and the scalar multiplication for different group 
orders. In addition we present the timings for genus-3 HECC, genus-2 HECC, 
and ECC using the same underlying library running on the same processor for 
comparison [34] . The formulae used for HECC implementation of genera two [23] 
and three [34] are to our knowledge the most efficient. In the case of ECC 
projective coordinates were used. 

Contrary to common believe, the results show that genus-4 curves are worth 
to implement on 32-bit processors, though their performance for group orders 
larger than 2 160 is slightly worse than that of HECC with genus g < 3. Consider- 
ing a group order of approximately 2 1 , genus-2 and genus-3 curves are a factor 
of 1.66 and a factor of 2.08, respectively, better than the introduced formulae for 
genus-4 HECC on the ARM7TDMI. Similarly, we obtain speed-ups of a factor 
of 1.72 and factor of 2.77 when using genus-2 and genus-3, respectively, com- 



Table 3. Timings of group operations with ARMulator ARM7TDMI@80MHz (explicit 
formulae) 



Genus 


Field 


Group order 


Group addition 


Group doubling 


Scalar, mult. 








in fis 


in fis 


in ms 




F240 


2 IbU 


1315 


740 


172.43 




F 2 41 


2 164 


1304 


734 


174.80 


4 


F 2 44 


2 176 


1319 


747 


190.07 




F 2 46 


2 184 


1323 


752 


199.49 




F 2 47 


2 188 


1310 


745 


201.89 




F 2 63 


2252 


1372 


797 


286.51 


3 


F 2 63 


2 18y 


615 


219 


72.09 


2 


F 2 95 


2 150 


511 


504 


121.49 


1 


F 2 191 


2^ 


598 


358 


100 



Table 4. Timings of group operations on the Pentium4@1.8GHz (explicit formulae) 



Genus 


Field 


Group order 


Group addition 


Group doubling 


Scalar, mult. 








in fis 


in fis 


in ms 




F 2 40 


2 IbU 


51.0 


29.3 


6.88 




F 2 4i 


2 164 


49.7 


27.6 


6.96 


4 


F 2 44 


2 176 


49.9 


27.9 


7.50 




F 2 46 


2 184 


50.1 


28.2 


7.92 




F 2 47 


2 188 


50.3 


29.3 


8.05 




F 2 63 


2252 


51.6 


29.8 


8.43 


3 


F 2 63 


2 i«9 


23.4 


8.6 


2.91 


2 


F 2 95 


2 tot 


19.1 


18.8 


4.68 


1 


F 2 191 


2 157 


15.4 


8.7 


2.78 
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pared to genus-4 curves on the Pentium4 for the same group order. Compared 
to ECC, genus-4 HECC performs a factor of 2.90 and a factor of 2.02 worse on 
the Pentium and ARM7TDMI, respectively. 

Notice that the relative performance of genus-4 HECC is always better on 
the ARM microprocessor compared to the Pentium. Hence, we conclude that 
genus-4 HECC are well suited for the encryption in constraint environments 
and we encourage the research community to put more effort into the further 
development of this system. 

7.2 Low Cost Security 

In contrast to the facts mentioned in Section 7.1, there is a clear benefit of using 
genus-4 HECC over a 32-bit finite field. For hyperelliptic curve cryptosystems 
with a group order of ~ 2 128 , genus-4 group operations only require 32-bit field 
arithmetic. Thus, the processor word is optimal utilized and the cryptosystem 
does not need additional multi-precision arithmetic. 

Analyzing the results for a group order of around 2 128 as presented in Table 5, 
a major advantage of genus-4 HEC is noticeable. The comparison of the timings 
yield to following facts (numbers in parenthesis are for the Pentium timings): 

— Genus-4 HECC outperforms genus-2 HECC by a factor of 1.46 (1-24) 

— HECC of genus 3 are slightly faster by a factor of 1.04 (1.08) than genus-4 
HECC 

As a result of this comparison, we suggest the use of genus-4 HECC for short 
term security applications. Despite the high number of required field operations 
compared to HEC with genus g < 3, group operations on genus-4 HEC are easier 
to implement. This fact relies on the relatively simple field arithmetic based on 
operands of length no longer than 32 bits. 



Table 5. Timings on the ARM7TDMI@80MHz and Pentium4@1.8GHz for group 
order « 2 128 (explicit formulae) 



Genus 


Field 


ARMu 
Group order 


lator ARM7TDI 
Group addition 
in gs 


vlI@80MHz 
Group doubling 
in gs 


Scalar, mult, 
in ms 


4 


F 2 32 


2™ 


441 


260 


49.07 


3 


F 2 43 


2 ™ 


603 


199 


47.13 


2 


F 2 63 


2 125 


450 


443 


71.54 


4 


F 2 32 


2 128 


Pentium4@l,8C 

17.3 


1Hz 

11.6 


2.14 


3 


F 2 43 


2™ 


23.6 


8.2 


1.98 


2 


F 2 63 


2 125 


16.8 


15.9 


2.66 
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8 Conclusions 

The work at hand presents major improvements to hyperelliptic curve cryp- 
tosystems of genus 4. We were able to reduce the average complexity of a scalar 
multiplication by up to 60% compared to the currently best known formulae. 
The steps of the explicit formulae used to compute the group operations are 
presented for the first time for the case of genus-4 curves. 

Additionally, we showed with the first implementation of genus-4 HECC on 
an embedded microprocessor that this cryptosystem is better suited for the use 
in embedded environments, than to general purpose processors. Contrary to 
common believe, our timing results show the practical relevance of this system. 

Especially for applications based on asymmetric algorithms with group or- 
ders around 2 128 , genus-4 HECC can be consider a viable choice. Not only the 
underlying field operations consist of simple 32-bit operations, but also we get 
better performance than genus-2 curves and similar speed than genus-3 curves. 
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A Timings for Field Operation 

In this section we present the timings for field operations for selected field orders. 



Table 6. Timings field operations for genus-4 HECC 



Field 


A 

Group order 


LRMulator AR]\ 
Field inversion 
in fis 


47TDMI@80MHz 
Field multiplication 
in fis 


inversion / 
multiplication 


F 2 32 


2™ 


26.8 


2.6 


10.16 


F 2 40 


2 tbd 


49.2 


7.3 


6.73 


F 2 47 


2 158 


77.5 


7.3 


10.5 


Field 


Group order 


Pentium-! 
Field inversion 
in ns 


l@1.8GHz 

Field multiplication 
in ns 


inversion / 
multiplication 


F 2 32 


2 m 


1650 


168 


9.82 


F 2 40 


2™ 


2519 


413 


6.04 


F 2 47 


2 155 


3752 


402 


9.33 
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B Explicit Formulae Genus-4 HECC 

Table 7. Explicit formulae for adding on a HEC of genus four 



Weight four reduced divisors D-± = (a^ , b^) and D 2 = (a 2 , b 2 ) 
where: = a: 4 + aa: 3 -|- bx 2 + cx + d; 

= itc 3 + jx 2 + kx + 1 
a 2 = x 4 + ex 3 + fx 2 + gx + h; 
b 2 = mx 3 + nx 2 + ox + p 

h = h 4 x 4 + h 3 x 3 + h 2 x 2 + h^x + Hq , where h ^ € {0, 1}; 
f = X 9 + f 7 x 7 + fQX 6 + f 5 x 5 + /41 4 + f 3 x 3 + f 2 x 2 + fix + /q 



A weight four reduced divisor ZD 3 = (03, b 3 ) — D-^ -f- D 2 



where: 



-|- ax' 3 -|- bx z -)- cx + d; 



+ i* + kx + l 



Procedure (For simplicity, only the implemented case is given.) 

3” 



Step 



Almost inverse ; 



: r / mod a 2 = i 



+ inv 2 x z + invjx + in-up: 



45 M + IS 46 M 



r 23 = e + a; r 22 = f + b; r 21 = g + c; r 20 = h + d; t 20 = 1; 

r 33 = r 23 ’ a + r 22 : r 32 = r 23 ' b + r 21 5 r 31 = r 23 ' c + r 20 5 r 30 = r 23 ’ d 5 

*31 = 1 > *30 = r 23 5 r 42 = r 33 ' r 22 + r 23 ‘ r 32 5 r 41 = r 33 ’ r 21 + r 23 ' r 31 ■ 

r 40 = r 33 ' r 20 + r 23 ' r 30 5 *41 = r 23 5 *40 = r 33 + r 23 : 

r 52 = r 42 ’ r 32 + r 33 ' r 41 • r 51 = r 42 ' r 31 + r 33 ' ^40 : 

r 50 = r 42 ’ r 30: *52 = r 33 ' *41 5 *51 = r 42 + r 33 ' *40! *50 = r 42 ' r 23 5 

r 61 = r 52 ’ r 41 + r 42 ' r 51i r 60 = r 52 ' r 40 + r 42 ‘ r 50 5 *62 = r 42 ’ *52! 

*61 = r 52 *41+ r 42 *5i: *60 = r 52 *40 + r 42 *50 5 r 71 = r 61 r 51 + r 52 r 60 5 

r 70 = r 61 ' r 50! *73 = r 52 ' *62! *72 = r 61 ' *52 + r 52 ' *615 

*71 = r 61 ’ *51 + r 52 ' *60 5 *70 = r 61 ' *50 5 r 80 = r 71 ' r 60 + r 61 ' r 70 5 

inv 3 = r 61 • £73 ; inv 2 = r 71 • t 62 + r 61 • t 72 ; 

inv 1 = r 71 ■ t 61 + t - 61 • t 71 ; inv Q = r 71 • t 60 + r 61 • t 70 ; 

s f = rs = (b 2 — bi)inv mod a 2 = Sgtc 3 + s 2 x 2 + x + Sq (Karatsuba): 



t a = m + i ; t b = n+j; t c = o + k; t d = p + l; t e = inv 3 ; tf = inv 2 ; 
irtvi ; t h = irtVQ-, t Q = t c ■ t g ; *1 = t b ■ tf ; t 2 = t a ■ t e ; *3 = t b ■ t g 
*4 = *c ' */ 5 *10 = t d ' t h’ *11 = (*c + *d) ' (*0 + *h) + *0 + *105 
*12 = (*t> +* — <*)• (*/ + *h) + *10 + *1 + *05 
*13 = (*a + *rf) ' (*e + *h) + *10 + *2 + *3 + *45 
*14 = (*a + *c) ' (*e + tg) + *2 + *0 +*15 

*15 = (*a + *{,) ' (*e + */) + *2 + *1 5 *16 = *2 5 *17 = *15 + e ' *16 5 

*18 = e ’ *17 + *16 ’ f + *145 s 3 = e ’ *18 + f ' *17 + 9 • *16 + *135 
/ ’ *18 + 9 ■ *17 + h ’ *16 + *12 5 

S 1 = 9 ■ *18 + h ’ *17 + *115 s Q = h ■ t 18 + t 10 ; 

3 2 / 

s = x + s 2 a: + x^x sq = s made monic: 

/ 1 / 2 

*1 = r 80 ■ s 3 ; w 6 = 1 1 ; w 7 = r 80 • kjq ; w 4 = r 80 ■ w 7 ; w 3 = s 3 



*0 — 



I + 7 M + 2 S 



u 5 = w 4 > s 0 = ■ 
: = sa ^ = x‘ + 



0 ' 



75 s 2 = s 2 ’ w 7 5 

^ -(- 23a: 3 + z 2 x 2 + z 4 x + zq (Karats.): 



*0 = c ■ s 15 *1 = b ■ s 25 zq = s 0 • d\ z 4 = (c + d) ■ (s x + s 0 ) + *0 + z 0 • 
z 2 = (b + d) (s 2 +so) + 2 0 +*l+*o; z 3 = (a + d)-(l + s 0 ) + z 0 + a + b s 1 +c s 2 ; 
z 4 = ( a + c ) ' (l + s l) + a + *0 + *l + s 05 z 5 = ( a + b ) ' (! + s 2) + a + *l + S 1 5 z 6 = 
a + s 2 ; z 7=1; 

a' = [s(z + m 4 (h + 2 bx)) - w 5 ((f - b 4 h - b‘l)/a 1 )]/a 2 



= x° +t 






' + u l x + uq: 



*1 = s 2 . w 4 ; t 2 = si ■ w 4 ; diff 4 = s 2 ■ zq + z 5 + s 4 + f ; 
diff 3 = g + z 4 + sq + s 2 z 5 + si ■ zq; 

dif f 2 = h + z 3 + s 2 ■ z 4 + s 4 ■ zq + sq • zq; 

diff 1 = s 2 ■ z 3 + S! ■ z 4 sq ■ z 5 + z 2 + w 5 ; 

diffQ = w 4 + s 2 ■ z 2 + si ■ z 3 + sq ■ z 4 + wq ■ a + z 4 ; u' 5 = zq + s 2 + e 

dif f 4 + e • u' b ; 



u' 3 = dif f 3 + e ■ 


u ' 4 + / ■ 


5 ’ u 2 


— dif / 2 + e • Ug + / ■ u ' 4 + g ■ itg ; 




u'i = diffi + e ’ 


u ' 2 + f-U 


3 + 9 


' “4 + h ■ u 5 : 




u'q = diffQ + e • 


U 1 + f ■ w 


2+9 


■ “3 + h ■ “4 1 




b = — (w 3 z + h + b^) mod 


a' = 


p^a: 5 + u^a: 4 + •p 3 ® ,;J + i^a^ + u^a: + Vq: 


12M 


*1 — u 5 + z 65 v q 


= w 3 • (+ 


u ' 5 ■ ti 


+ W4 + z 5)5 v ' 4 = w 3 ‘ (+ w 4 ’ *1 + "“3 + z 4)5 





J 3 = w 3 
J 1 = w 3 

>3 = (f 



(+' ti 3 ‘ *1 + u 2 
(+ u l ’ *1 + u o 

- b'h - b' 2 )/a' 



+ z 3 ) + i; v' 2=W3 . 

+ z i) + 1 + fc; v o = 

= u 34 x 4 + u 33 a: 3 + 1 



\' u 2 ‘ *1 + ' 
w 3 ■ (+ n o ' * 
1 32 x2 + “31 = 



z o) + l > 



diff 3 = 1; diff 2 = 
u 33 = dif f 3 + u 34 ■ 
M 31 = diffi + u 3 4 • 
u 30 = diff 0 + u 34 • 
a 3 = x 4 + dx 3 + bx 2 



diffi = f 7 5 diffQ = f 6 + t 



u 5 5 u 32 = diff 2 + u 34 
u 3 + u 33 ■ u 4 + n 32 ’ *4 5 
u 2 + u 33_ ‘ u 3 + u 32 • 4 + 
+ cx + d = a' 3 made monic: 



+ u 33 ■ 



*0 = u 34 5 d = u 33 
b 3 = —{b + h) mod a 3 = 



b = u 
— 7 t >3 . 



'0 5 c = U 31 ■ 
■ +kx + f: 



• tp -|- d ■ 4~ v\ — |— 1 ; l — d ■ tp -|- 



; + v 3 > 0 — b ■ tp + c • 



I + 7 M + 2 S 



i fields of arbitrary characteristic 
1 fields of characteristic 2 and h(x) = : 



27+148M+6S 



2I+160M+4S 
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Table 8. Explicit formulae for doubling on HEC of genus four 



A weight four reduced divisor D± = (a^ , bi) 



where: 
— 



' + cx + d; 



+ ax ' 3 + bx z 
bi = ix d + jx z + kx + Z 
h = Zi 4 X^ + h 3 x^ + Zi 2 x^ + h-^x + hp , where h ^ & { 0 , 1 }; 
f = x 9 + f 7 x 7 + f R x 6 + f 5 x 5 + / 4 » 4 + f 3 x 3 + f 2 x 2 + f ix + Jo; 



A weight four reduced divisor D 2 = (a 2 , b 2 ) = [ 2] _D ]_ 



where 

= 7x3 



v + ax J + bx z + cx + d; 



b 2 = ix J + jx z + kx + Z 



Procedure (For simplicity, only the implemented case is given.) 



Step 



Almost inverse inv = r / (h -(- 2bi ) mod 
o 2 

= inv 3 x + inn 2 x + inrji + irinp: 
inv 3 = 1 ; inv 2 = a ; inn ^ = b; inn q = c; 

■=((/- fcfel ~ bi)/ai) mod ai = x 4 + Z 3 X 3 + Z 2 a;2 + z l a + Z Q : 



d*//2 = * + c; di// 0 = j + i; z ' 4 = a; z ' 3 = b + a z + fy, 

z 2 = dif f2 a ■ (.z% b) fQ; z'-^ = d -{- a (z 2 +c) + b- Zg+/5; 

z 0 = diffo + a • ( z i 4 " d) + b ■ 4 " c • z 3 4 " /4 > 

*3 = & + z 3 \ z 2 = c + z' 2 ; zi = d + z^ ; zp = z p ! 

s^ = zinv mod a^ = s^x^ + S 2 X ^ "I" s i £C "I - s p (Karatsuba): 

*1 = *1 ■ inn!; *2 = z 2 ‘ inv 2> *10 = z 0 ' inv 01 *16 = z 3 ' inv 3! 

*15 = ( 2 3 + z 2 ) ’ ( inv 3 + inv 2 ) + *16 + * 2 ! 

*14 = ( z 3 + z l) ■ (* n-u 3 + inn X ) + *16 + *1 + *2 5 

*13 = ( 2 3 + z 0) ' (i nv 3 + invQ ) + t 10 + *16 + z 2 ' inv 1 + Z 1 ' ; 

*12 = ( z 2 + z 0 ) ■ (i nv 2 + invQ ) + t 10 + t 2 + * 1 ; 

*11 = ( Z 1 + z 0) - ( inv 1 + inn 0 ) + *1 4 - *10! 

*3 = *15 + a ‘ *161 *4 = a ’ *3 + b ’ *16 + *14! Sp = d ■ *4 + tig; 

S 1 = c • *4 + d • *3 + *n ; s 2 = b ■ *4 + c • *3 + d ■ *ig + *12 ; 

So = a • *4 + b • *3 + c • tig + *13 ; 

3 2 / 

s = x + s 2 x + xix -(- sq = s made monic: 



I + 7 M + 2 S' 



*1 = d • Sg; in g = * ; m 7 = d • tug > uj 4 = d • toy ; m 3 = Sg • mg; 

™5 = '*"4! s 0 = s o ’ w 7 ! S 1 = s i ■ ™ 7 ; s 2 = s 2 ' ^ 7 ! 

G = sa! = x 7 + g^ 6 + + 9 £X^ + ggx 3 + 92 x ' 2 ‘ + 9 \x + gp (Karat.): 



*p = c • si ; *i = b ■ s 2 ; gp = sp • d; ffi = (c + d) • (si + sp) + tp + gp ; 

92 = (b + d)(s 2 + sp) + gp+*i+*p; g 3 = (a + d) • (l + sp) + gp +a + b- si +c- s 2 ; 
9 4 = (® + c) ■ (l + si)+a + *p+*i +sp; gg = (a + b) • (l + s 2 ) + a + *i +si ; gg = 
a + s 2 ; 

0 ' = a“ 2 [(G + U) 4 l>i) 2 + » 4 kC + m s (hbi - /)] 



= *6 + , 



' +uix + up: 




(mpz + h + bi) mod a / = 



* 3 = 1 

2 = *4 + *1 + *P ■ (*3 + *0): u i = w 5’ 

2 +■ *1 ‘ (*0 4- * 3 )! 



+ •U4® 4 + VoX 3 + t 



' + ^l 31 + V Q : 



— v-o- 1 1 “i; — - 5- ' -4- 1 -3- 1 ~2~ 1 “1“- 1 “O' 

5 = m 3 • (u' 4 + g 5 ); v' 4 = m 3 • (n^ • gg + g 4 ); n^ = m 3 ■ (n^ + g 3 ) + i; 

2 = ™3 ' ( w 2 ■ 96 + u \ 4- 92) + 3i v' ± = w 3 ■ (n^ ■ gg + u' Q + g 1 ) + 1 + fc ; 
,/ _ 1 1 7. 



J 0 = w 3 ■ ( U Q ‘ ^6 + 50) + l < 

*2 = (/ ~ ~ b ' 2 )/ a ' = ^ 24 a;4 + u 23 x ^ + u 22 x2 + u 21 3! + ^20 : 



4M + 3SQ 



diffz = 1; di// 2 = n^ ; diffi = / 7 ; di//p = /g + n^ + ng ; n 24 = 1 
n 23 = diff 3 ; n 22 = di// 2 + n 24 • n 2 i = diffi + w 23 - u 4 ! 

w 20 = dif /p + n 2 4 ■ n 2 + u 22 • u 4 ; 

4 3 — 2 — / 

a 2 = x + ox + bx + cx + d = a 2 made monic: 



*0 = 1 



: u 23 *0! b = ' u 22 *0! 5 = U 21 *0! d = u 20 ’ *0 • 



b 2 = — (b' + Zi) mod a 2 = ix " 3 + jx^ + kx + Z: 



*0 — v 4 4- U 5 ■ <12 ; i — a • *p + b • 



I + 7M +2S 



16M + 2 SQ 



1 fields of arbitrary characteristic 
1 fields of characteristic 2 with h(x) = x 



2/+75M+14S 



2/+193M+16S 
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Abstract. We propose a simple algorithm to select group generators 
suitable for pairing-based cryptosystems. The selected parameters are 
shown to favor implementations of the Tate pairing that are at once 
conceptually simple and efficient, with an observed performance about 2 
to 10 times better than previously reported implementations, depending 
on the embedding degree. Our algorithm has beneficial side effects: var- 
ious non-pairing operations become faster, and bandwidth may be saved. 

Keywords: pairing-based cryptosystems, group generators, elliptic curves, 
Tate pairing. 



1 Introduction 

Pairing-based cryptosystems are currently one of the most active areas of re- 
search in elliptic curve cryptography, as we see from the abundance of recent 
literature on the subject. This interest is not unfounded, as previously unsolved 
problems have been cracked by using pairings. 

To date, most suitable pairings are based on the Tate pairing over certain 
elliptic curve groups, a notable exception being that of Boneh, Mironov and 
Shoup [6] based on the String RSA assumption. Unfortunately, the Tate pairing 
is an expensive operation and is often the bottleneck in such systems. 

Efficient pairings for supersingular curves have been proposed [2, 9, 12]. How- 
ever, there is a widespread feeling that supersingular curves should be avoided 
whenever possible, as they may be more susceptible to attacks than ordinary 
curves. Moreover, for technical reasons, one is often forced to use fields of small 
characteristic [.6, section 5.2.2], which are more vulnerable to Coppersmith’s 
discrete logarithm attack [7] . Protecting against this attack increases bandwidth 
requirements (larger fields), and while this may not be an issue in some situa- 
tions, it is a central concern in many cases (e.g. short BLS signatures [5]). 



M. Matsui and R. Zuccherato (Eds.): SAC 2003, LNCS 3006, pp. 17—25, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 



18 



Paulo S. L. M. Barreto et al. 



Thus we would like to find similar optimizations for ordinary curves over 
fields of large characteristics containing subgroups of manageable embedding 
degree [3, 8, 18]. 

We show how to select groups in nonsupersingular curves where many op- 
timizations proposed for supersingular curves [2] have a counterpart, and ob- 
tain running times that are up to ten times better than previously reported 
results [13]. In particular, we show how to perform elimination of irrelevant fac- 
tors and denominators during the computation of the Tate pairing, which is 
rendered conceptually simpler and substantially more efficient. Additionally, it 
turns out that operations of pairing-based schemes that do not rely on pairings, 
such as key generation, become more efficient with our choice of groups. 

This paper is organized as follows. Section 2 recalls some concepts essential 
to the discussion of pairings. Section 3 describes our group selection algorithm. 
Section 4 explains how the selected groups lead to efficient implementation of 
the Tate pairing. We compare our results with previous work in Section 5, and 
present our conclusions in Section 6. 

2 Preliminaries 

A subgroup G of (the group of points of) an elliptic curve E( F g ) is said to have 
embedding degree k if its order r divides q k — 1, but does not divide q l — 1 for 
all 0 < i < k. We assume k > 1. The group E[r] = F r x F r of r-torsion points 
lies in E(W q k) [1], 

In what follows, let F g be a field of odd characteristic and E( F g ) an elliptic 
curve containing a subgroup of prime order r with embedding degree k , and 
assume that r and k are coprime. 

2.1 The Twist of a Curve 

Let E(W q ) given by the short Weierstrafi equation y 2 = x 3 + ax + b , let d be 
a factor of k and let v £ F g d be some quadratic non-residue. The twist of E 
over F g d is the curve E'(W q d) : y 2 = x 3 + v 2 a x + v 3 b. The orders of the groups 
of rational points of these curves satisfy the relation #_E(F 9 «0 + #E'(V q d) = 
2 q d + 2 [4, section III. 3]. 

In the above equation, if v is instead a quadratic residue, then it is easy to 
check that an isomorphism E — > E’ given by (X,Y) i— > (yX,v^/vY) exists. 

2.2 Divisors and the Tate Pairing 

For our purposes, a divisor on If is a formal sum D = k ) u p(P) 

where np £ Z- 

The set of points P £ E(W q *) such that np ^ 0 is called the support of D. 
The degree of D is the value deg(£>) = n P- The null divisor, denoted 0, has 
all np = 0. The sum of two divisors D = J2p Pp{P) and D' = ^2 P n' P (P) is the 
divisor D + D' = J2p ( n P + n 'p)(P)- 
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Given a nonzero rational function / : E(¥ q k ) — * F q k , the divisor of f is the 
divisor (/) = ordp(/)(P) where ord p(/) is the multiplicity of / at P. It 
follows from this definition that ( fg ) = (/) + ( g ) and (/ / g) = (/) — (g) for any 
two nonzero rational functions / and g defined on E\ moreover, (/) = 0 if and 
only if / is a nonzero constant. 

We say two divisors D and D' are equivalent, D' ~ D 1 if there exists a func- 
tion g such that D' = D+(g). For any function / and any divisor D = ^2 P np(P) 
of degree zero, we define f(D) = ]~[p /(P) rap ■ 

The Tate pairing is a bilinear mapping e : P(F 9 )[r] x E(¥ q k) — ► Specif- 

ically, let P £ E(W q ) be a point of order r, let / be a function whose divisor 
satisfies (/) = r(P) — r(O), let Q £ E(W q k), and let D ~ ( Q ) — (O) be a divisor 
whose support is disjoint from the support of (/). We define the (reduced) Tate 
pairing as 

e(P,Q) = 

One can show [11] that this mapping is indeed bilinear, and also nondegenerate 
for linearly independent P and Q if r divides the order of Q. 

More generally, if D' is a divisor satisfying D' ~ (P) — ( O ) then we can 
substitute any /' for / such that (/') = rD', so long as the support of D' is 
disjoint to that of D. 

Note that raising f(D) to ( q k — 1 )/r ensures that the result is either 1 or 
an element of order r. This property is useful in efficiently preventing small 
subgroup attacks [15]. There is no need to multiply Q by a large cofactor to 
avoid these attacks, as checking the pairing value is sufficient. 

2.3 The Frobenius Endomorphism 

The Frobenius endomorphism is the mapping <f> : E(¥ q k) — » E(W q *), (X, Y) i— > 
(. X q , Y q ). Thus a point P £ F(F g O is defined over F^ if and only if $*(P) = P; 
in particular, <I> fc (P) = P for any P £ E(jfi q k). 



2.4 The Trace Map 

The trace map is the mapping tr : E( F g fc) — > P(F 9 ) defined as tr(P) = P + 
$(P) + <f> 2 (P) + • • • + >I> fe_1 (P). We have tr(4>(P)) = <f>(tr(P)) = tr(P) for any 
P £ E(W q k), (which shows that the range of the map is indeed P(F g ))- 

We describe the two eigenspaces of the trace map on E[r]. The eigenvalues 
are k and 0. 

Lemma 1. The k-eigenspace of the trace map is P(F 9 )[r]. 

Proof. Clearly, all points R £ P(F 9 )[r] satisfy tr(P) = [k]R, hence we only 
need to show that all points R £ E[r] such that tr(P) = [k\R are defined over 
F 9 . Indeed, if tr(P) = [k]R, then $(tr(i?)) = <f>([fc]P) = [fc]$(i?), but since 
$(tr(i?)) = tr(P), it follows that [/c]<f)(P) = tr(P) = [k\R and thus [fc](d>(P) — 
R) = O. As k is coprime to the order of R , necessarily <!>(P) — R = O, hence R 
must be defined over F g , that is, R £ P(F g )[r]. □ 
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It is easy to verify that for any R £ E(¥ q k) the point Q = R — Q(R) satisfies 
tr(Q) = O. This provides a way of generating points of trace zero. Since at least 
one finite point Q can be constructed in this fashion (provided k > 1), we see 
that the other eigenvalue of the trace map is indeed zero. We know that this 
space must be one-dimensional, since the other dimension has been accounted 
for by £(F g )[r]. 

We now describe the eigenspaces of the Frobenius map on E[r]. The char- 
acteristic polynomial of the Frobenius endomorphism is the polynomial ir(u) = 
u 2 — tu + q. The value t is called the trace of the Frobenius endomorphism, 
not to be confused with the trace map. The polynomial 7r factorizes as tt(u) = 
( u — l)(u — q ) (mod r), so the eigenvalues are 1 and q. 

Lemma 2. The l-eigenspace of $ is L(F g )[r]. 

Proof. A point of E( F g fc) is fixed under $ if and only if it lies in E( F g ). □ 

Lemma 3. The q-eigenspace of $ consists of all points R £ E[r\ satisfying 
tr (R) = O. 

Proof. If a point R satisfies tr(i?) = (1 + $ + ... + >1 > fc-1 )I? = O, then tr(<f>(f?)) = 
(d) + ... + & k )R = O. In other words, the points of trace zero are mapped to 
points of trace zero under •!> and hence must constitute an eigenspace. As the 1- 
eigenspace has already been accounted for, the set of points of trace zero must 
be the g-eigenspace of <I>. □ 

3 Parameter Generation 

Assume k is even and set d = k/2. We propose a method for selecting group 
generators that makes the pairing more efficient, and additionally improves the 
performance of operations in pairing-based schemes that do not use the pairing, 
such as key generation. 

Let E be given by y 2 = x 3 + ax + b, and consider its twist over F g d , namely, 
the curve E'{¥ q d) : y 2 = x 3 + v 2 a x + v 3 b for some quadratic non-residue v £ 
F q d. In Fqfc, v is a quadratic residue, which means the map T : (X, Y) i— > 
(u -1 X, is an isomorphism that maps the group of points of E'(W q d) 

to a subgroup of points of E( F g fc). 

Let Q' = (X,Y) £ E'(Fqd), and set Q = *1 t(Q') = (v~ 1 X, (vy / v)~ 1 Y) £ 
E(W q k). By construction, the x-coordinate of Q is an element of F g d, allowing 
the denominator elimination optimization that will be described in the next 
section. This suggests the following group selection algorithm. 



Group Selection Algorithm: 

1. Randomly generate a point P £ E( F 9 ) of order r. 

2. Randomly generate a point Q' £ E'(F g d). 



On the Selection of Pairing-Friendly Groups 



21 



We view the domain of the Tate pairing as (P) x (Q), where Q = 4 '(Q'). It 
may be desirable to explicitly check that e(P 1 Q) ^ 1, but as this occurs with 
overwhelming probability, in some situations it could be safe to skip this check. 
Note that only P is required to have order r. 

Operations that do not use the pairing such as key generation and point 
transmission can be performed using only arithmetic on W q d. Points of E' (V q d) 
are mapped back to points on E(j? q k) only when needed for a pairing computa- 
tion. This avoids many f q k operations and halves bandwidth requirements. 

For instance, if k = 2, pairing-based protocols can be implemented using 
E(V q ) arithmetic, readily available in a highly optimized form in many code 
libraries, along with support for simple F g 2 operations for the pairing computa- 
tion. For higher fc, we suggest implementing as F q [x\/Rk{x), where Pfc(x) 
is the sparsest possible polynomial containing only terms of even degree. In this 
case, elements in F 9 <j are polynomials lacking any term of odd degree. 

3.1 Some Remarks on the Selected Groups 

We mention a few observations on the groups selected by our algorithm. 

Lemma 4. Let Q = (X, Y) £ E(jfi q k) be a finite point. Then < f> d (Q) = —Q if 
and only if X q _1 = 1 (i.e. X £ F q d) and Y q ” 1 = — 1. 

Proof. Since — Q = (X, —Y) (for a suitable Weierstrafi form), we conclude that 
4> d (X,F) = ( X qd ,Y qd ) = (X,-F) if and only if X**- 1 = 1 (i.e. X £ F g <0 
and Y^- 1 = -1. □ 

Thus 4'(£' , (F (? <i)) is precisely the group of points in E(W q k) satisfying <F d (Q) = 
— Q, which is a subgroup of the trace zero points of E(W q k). 

Hence an alternative way to pick Q in our algorithm is to choose a random 
R £ E(W q k) and set Q <— R— Q d (R). However this is slower than finding points 
of E'(¥ q d), and we also do not obtain the bonus of speeding up non-pairing 
operations. 

Lastly, we note that the above lemma can be used to show that r-torsion 
points of trace zero have a special form. 

Corollary 1. Let Q = (X, F) £ U(F q fc)[r] be a finite point with tr (Q) = O. 
Then X £ F q d and Y q = —1. 

Proof. As tr(Q) = O, the point Q lies in the g-eigenspace of the Frobenius map 
H>, that is, <F(Q) = [q]Q. We have q d = — 1 (mod r), because q 2d = 1 (mod r) 
and 2d = k is the smallest integer for which this holds. Thus < f> d ((3) = —Q. By 
Lemma 4 we have X q _1 = 1 and Y q -1 = — 1. □ 
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4 Tate Pairing Computation 

We review Miller’s algorithm [ L7] for computing the Tate pairing and describe 
how to optimize it for the subgroups constructed according to our algorithm. 

Let P £ -E(Fg)[r] and Q £ E(V q k) be linearly independent points. Let / be 
the rational function with divisor (/) = r(P) — r(O). We wish to compute the 
Tate pairing e(P,Q) = f(D) ( ' qk ~ 1 ^ r , where D satisfies D ~ (Q) — (O), and the 
support of D does not contain P or O. 

For this section, instead of requiring k to be even and setting d = kj 2, we 
generalize so that d now represents any proper factor of k, that is, d \ k and 
d < k. 

Lemma 5. q d — 1 is a factor of ( q k — l)/r. 

Proof. We start with the factorization q k — 1 = ( q d — 1) o _1 ( /’ d - Since the 

embedding degree is k > 1, we have r \ q k — 1 and r \ q d — 1. Thus r \ JfiL o _1 
and q d — 1 survives as a factor of ( q k — l)/r. □ 

Corollary 2 (Irrelevant factors). One can multiply f(D) by any nonzero 
x £ F q d without affecting the pairing value. 

Proof. To compute the pairing, f(D) is raised to the exponent ( q k — 1 )/r. By 
Lemma 5, this exponent contains a factor q d — 1, thus by Fermat’s Little Theorem 
for finite fields [14, lemma 2.3], x^ qk ~ 1 ^ r = 1. □ 

The next theorem generalizes a result originally established only for certain 
supersingular curves [2, Theorem 1]: 

Theorem 1. Let P £ P(Fg)[r] and Q £ E( F q fc) be linearly independent points. 
Then e(P, Q) = f(Q)( qk ~ 1 '>/ r . 

Proof. Suppose R ^ {O, — P, Q,Q — P} is some point on the curve. Let /' be 
a function with divisor (/') = r{P + R) — r(R ) ~ (/), so that e(P, Q) = f{{Q) — 
(0))( q . Because f does not have a zero or pole at O, we have f'((Q) — 

(O)) = f'(Q)/f'(0), and since P has coordinates in F g , we know that f'(O) £ 
Fq- Corollary 2 then ensures that f'(0) is an irrelevant factor and can be omitted 
from the Tate pairing computation, i.e. e(P, Q) = f (Q)^ qh ~ 1 ^ r . 

Now (/') = r((P + R) - (R)) = r((P) - (O) + \g)) = (/) + r{g) for some 
rational function g, since (P + R) — (R) ~ (P) — (O). Thus f = fg r , and 
because Q is not a zero or pole of / or f (so that g(Q) £ F* fc is well defined) it 

follows that f(Q)( qk ~ 1 ')/ r = f{Q) {qk - 1 ' )/r g{Q ) qk - 1 = f(Q) {qk - 1)/r . □ 

The case of linearly dependent P and Q is trivially handled, as then we have 
e(P,Q) = l. 

In what follows, which we quote directly from Barreto et al. [2, Theorem 2], 
for each pair U, V £ E( F g ) we define guy ■ P(F q k) —* ¥ q k to be (the equation 
of) the line through points U and V (if U = V, then gu,v is the tangent to 
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the curve at U, and if either one of U, V is the point at infinity O , then guy 
is the vertical line at the other point). The shorthand gu stands for gu,-u- In 
affine coordinates, E : y 2 = x 3 + ax + b, for U = (xu,yu), V = (xy, Uv) and 
Q = (x, y), we have: 

gu,v(Q) = l, Q e (P). 
gu,u(Q) = Ai(x - Xu) +yu -y, Q& (P). 
gu,v(Q) = A 2 {x - xu) + yu - y, Q t (-P), U ± V. 
gu(Q) = x-xu, Q qL ( P ). 

where 

3 xfj + a yv -yu 

A 1 — o J A 2 — ■ 

2yu xy - xu 

Lemma 6 (Miller’s formula). Let P be a point on E(W q ) and f c be a function 
with divisor (f c ) = c(P) — ([c]P) — (c— 1)(0), c S Z- For all o,kZ, f a +b(Q) = 
fa(Q) ' fb{Q) ' g[a\P,[b\p(Q) / g[a+b]p(Q) ■ 

Proof. See Barreto et al. [2, Theorem 2]. □ 

Notice that (/ 0 ) = (/i) = 0, so that by corollary 2 we can set fo(Q) = 
fi(Q) = 1- Furthermore, f a +i(Q) = fa(Q) ■ g a p,p(Q)/ J(„+i)p(Q) and f 2a {Q) = 
fa{Q) 2 ■ g[a]p,[a]p{Q) / g[ 2 a\p{Q)- Recall that r > 0 is the order of P. Let its binary 
representation be r = (r t , . . . , r\, ro) where r\ £ {0,1} and r t ^ 0. Miller’s 
algorithm computes f(Q) = f r (Q), Q {0,P}, by coupling the above formulas 
with the double- and- add method to calculate [r]P: 



Miller’s Algorithm: 

set / <— 1 and V <— P 

for i <— t — 1 , t — 2, . . . , 1 , 0 do { 

set / <- f 2 ■ gv,v(Q)/g[2]v{Q) and V 2V 

if r, : = 1 then set / «- / • gv,p{Q) / g v+P (Q) and V <— V + P 

} 

return / 

Miller’s algorithm can be simplified further if k is even, as established by the 
following generalization of a previous result [2, Theorem 2]: 

Theorem 2 (Denominator elimination). Let P £ P(F g )[r]. Suppose Q = 
(X,Y) £ E(jS q k) and X £ F q <i. Then the g[ 2 ]v and gv+p denominators in 
Miller’s algorithm can be discarded without changing the value of e(P,Q). 

Proof. The denominators in Miller’s formula have the form gu{Q) = x—u, where 
x £ F q d is the abscissa of Q and u £ F g is the abscissa of U. Hence gu{Q) G F q d. 
By corollary 2, they can be discarded without changing the pairing value. □ 
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Table 1. Complexity of computing the Tate pairing 



algorithm 


coordinates 


k = 2, \q\ = 512 


k = 6, |q| = 171 


[13] 


projective 


20737. 6M 


33078. 3M 


ours, w/o precomp. 


projective 


4153. 2M 


15633. 0M 


ours, with precomp. 


projective 


2997.6M 


14055. 4M 


ours, with precomp. 


affine 


1899. 6M 


11110. 2M 



5 Results 

To illustrate the effectiveness of our method for the computation of the Tate 
pairing, we compare our results with those of Izu and Takagi [13] for non- 
supersingular curves with k = 2 and k = 6. 

The computation of e(P, Q) requires all of the intermediate points computed 
during the scalar multiplication [r]P. If P is fixed, these can be precalculated 
and stored, with considerable savings. In this case affine coordinates are faster, 
and require less storage. Otherwise we follow Izu and Takagi [13] and use pro- 
jective coordinates. Additional savings could be obtained with the method of 
Eisentraeger, Lauter and Montgomery [10], but we have not implemented it. 

Table 1 summarizes the results, where M denotes the computing time of 
a multiplication in F 9 , and assuming that the time taken by one squaring is 
about 0.8 M. 

6 Conclusions 

We have shown how to select cryptographically significant groups where the Tate 
pairing can be efficiently implemented. 

Specifically, we have argued that the Tate pairing e(P, Q) is most efficiently 
calculated when P G P(F g )[r] and Q G E( F g fc) satisfies d> fc / 2 (Q) = —Q. We 
have also provided an algorithm to choose such P and Q so that e(P,Q) is 
nondegenerate. 

An interesting line of further research is the extension of our methods to 
hyperelliptic curves, possibly with enhancements. This has already been done 
for the supersingular case [9]. 
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Abstract. Counting rational points on Jacobian varieties of hyperel- 
liptic curves over finite fields is very important for constructing hyper- 
clliptic curve cryptosystems (HCC), but known algorithms for general 
curves over given large prime fields need very long running time. In this 
article, we propose an extremely fast point counting algorithm for hy- 
perelliptic curves of type y 2 = x 5 + ax over given large prime fields F p , 
e.g. 80-bit fields. For these curves, we also determine the necessary con- 
dition to be suitable for HCC, that is, to satisfy that the order of the 
Jacobian group is of the form l ■ c where l is a prime number greater 
than about 2 160 and c is a very small integer. We show some examples of 
suitable curves for HCC obtained by using our algorithm. We also treat 
curves of type y 2 = x 5 + a where a is not square in F p . 



1 Introduction 

Let C be a hyperelliptic curve of genus 2 over F g . Let Jc be the Jacobian variety 
of C and Jc(F g ) the group of F g -rational points of Jc- We call the group Jc(F g ) 
the Jacobian group of C. Since Jc (F g ) is a finite abelian group, we can construct 
a public- key-cryptosystem with it. This cryptosystem is called a “hyperelliptic 
curve cryptosystem (HCC)”. The advantage of HCC to an elliptic curve cryp- 
tosystem (ECC) is that we can construct a cryptosystem at the same security 
level as an elliptic one by using a defining field in a half size. More precisely, 
we need a 160-bit field to construct a secure ECC, but for HCC we only need 
an 80-bit field. The order of the Jacobian group of a hyperelliptic curve defined 
over an 80-bit field is about 160-bit. It is said that jjJc(F g ) = c ■ l where l is 
a prime number greater than about 2 160 and c is a very small integer is needed 
for a secure HCC. We call a hyperelliptic curve “suitable for HCC” if its Jacobian 
group has such a suitable order. 

As in the case of ECC, computing the order of the Jacobian group Jc(F g ) 
is very important for constructing HCC. But it is very difficult for hyperelliptic 
curves defined over 80-bit fields and there are very few results on it: Gaudry- 
Harley’s algorithm [9, 15] can compute the order for random hyperelliptic curves 
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over 80-bit fields but their algorithm needs very long running time, e.g. 1 week 
or longer. For a hyperelliptic curve with complex multiplication, there are known 
efficient algorithms (we call them “CM-methods” ) to construct a curve with its 
Jacobian group having a 160-bit prime factor. But CM-methods also need rather 
long time and do not give an algorithm to compute the order of the Jacobian 
group over a given defining field. There is another way. For special curves, it 
is possible to obtain a fast point counting algorithm for given defining fields. 
Buhler-Koblitz [2] obtained such algorithm for special curves of type y 2 + y = x n 
over prime fields F p where n is an odd prime such that p = 1 (mod n) . 

In this article, we propose an extremely fast algorithm to compute the order 
of the Jacobian group Jc(F p ) for hyperelliptic curves C defined by the equa- 
tion y 2 = x 5 + ax over large prime fields F p . Curves of this type are different 
from Buhler-Koblitz’s curves [2]. Though the curves of this type have complex 
multiplication, by using our algorithm we can obtain suitable curves for HCC 
much faster than by using CM-methods. The expected running time of our al- 
gorithm is 0(ln 4 p). The program based on our algorithm runs instantaneously 
on a system with Celeron 600MHz CPU and less than 1GB memory. It only 
takes less than 0.1 seconds even for 160-bit prime fields. Moreover we study on 
the reducibility of the J acobian variety over extension fields and the order of the 
Jacobian group for the above curves. After these studies, we determine the nec- 
essary condition to be suitable for HCC. In Section 5, we describe our algorithm 
and give some examples of hyperelliptic curves suitable for HCC obtained by 
using it. In the last section of this article, we treat another hyperelliptic curves 
of type y 2 = x 5 + a, a £ F p . When a is square in F p , it is a kind of Buhler- 
Koblitz’s curves [2]. Here we consider the case that a is not square. It is not 
appeared in Buhler-Koblitz’s curves. We describe our point counting algorithm 
for this type and show the result of search for suitable curves for HCC. In fact, 
Jacobian groups with prime order are obtained in a very short time over 80-bit 
prime fields. 



2 Basic Facts on Jacobian Varieties over Finite Fields 

Here we recall basic facts on the order of Jacobian groups of hyperelliptic curves 
over finite fields. ( cf. [9, 11] ) 



2.1 General Theory 

Let p be an odd prime number, F 9 is a finite field of order q = p l and C 
a hyperelliptic curve of genus g defined over F g . Then the defining equation of C 
is given as y 2 = f(x) where f(x) is a polynomial in F 9 [x] of degree 2 g + 1. 

Let Jc be the Jacobian variety of a hyperelliptic curve C. We denote the 
group of F g -rational points on Jc by Jc(F g ). Let x 9 (t) be the characteristic 
polynomial of q - th power Frobenius endomorphism of C. Then, the order ft Jc(F g ) 
is given by 



(JJ c (F 9 ) = Xq( !)• 
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The following ”Hasse-Weil bound” is a famous inequality which bounds jj Jc(¥ q ): 

r(^-i) 25 i< #j(f 9 )<k^+i) 2 ®j. 

Due to Mumford [16], every point on Jc(F g ) can be represented uniquely by 
a pair (u(x), v(x)) where u(x) and v(x) are polynomials in F g [x] with degri(x) < 
degu(x) < 2 such that u(x) divides f(x) — v(x) 2 . The identity element of the 
addition law is represented by (1,0). We refer this representation as “Mumford 
representation” in the following. By using Mumford representation of a point 
on Jc(F q ), we obtain an algorithm for adding two points on Jq{ F g ) (cf. Cantor’s 
algorithm [3], Harley’s algorithm [9]). 

2.2 Hasse-Witt Matrix and the Order of Jc(Fq) 

There is a well-known method to calculate tUc( F g ) (mod p) by using the Hasse- 
Witt matrix. The method is based on the following two theorems ([14, 22]). 

Theorem 1. Let y 2 = /(x) with deg / = 2g+l be the equation of a genus g hy- 
perelliptic curve. Denote by Ci the coefficient of x 1 in the polynomial f(xp p ~ 1 ^ 2 . 
Then the Hasse-Witt matrix is given by A = ( . 

For A = ( aij ), put A( p ) = (a? ). Then we have the following theorem. 

Theorem 2. Let C be a curve of genus g defined over a finite field ¥ q where q = 
p l . Let A be the Hasse-Witt matrix of C, and let A $ = AA^ A^ p l ■■ ■ A^ p \ Let 
n (t) be the polynomial given by det(/ g — tA^) where I g is the ( gxg ) identity matrix 
and Xq the characteristic polynomial of the q-th power Frobenius endomorphism. 
Then Xq(f) — (— 1 ) 9 t 9 n{t) (mod p). 

Due to the above two theorems, we can calculate j]Jc(F g ) (mod p) by the fol- 
lowing formula: 

ttJ c (F g ) = (— 1) 9 k(1) (mod p). 

But this method is not practical in general when p is very large. 



3 Basic Idea for Our Algorithm 

We only consider the case of genus 2 in the following. Let /(x) be a polynomial 
in F 9 [x] of degree 5 with no multiple root, C a hyperelliptic curve over F g of 
genus 2 defined by the equation y 2 = f{x). Then, the characteristic polynomial 
Xq{t) of the g-th power Frobenius endomorphism of C is of the form: 

Xq{t) = t 4 - Sit 3 + s 2 t 2 - Siqt + q 2 , \si\ < 4^/q, |s 2 | < Gq. 

Hence the order of Jc(F g ) is given by the following formula: 

t)^c(Fg) = q 2 + 1 — si(q + 1) + S 2 - 
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We also note on the well-known fact that s, are given by 

Si = 1 + q — Mi and S 2 = (M 2 — 1 — q 2 + s\)/2 

where Mi is the number of F g i-rational points on C (cf. [11]). 

The following sharp bound is useful for calculating jjJc(F g ). 

Lemma 1 (cf. [17, 15]). \2^q\s\ | — 2q\ < s 2 < |_Si/4 + 2g_| . 

In the following we consider the case of q = p. When q = p, we obtain the 
following lemma as a collorary of Theorem 1 and 2. 

Lemma 2. Let f{x), Si, p be as above and Ci the coefficient of x 1 in f(xp p ~0/ 2 < 
Then si = c p _i + c 2p _ 2 (mod p) and s 2 = c p _ic 2p _ 2 — c p _ 2 c 2p _i (mod p). 

Remark 1. Since |si| < 4 s Jp, if p > 64 then si is uniquely determined by c p _i, 
c 2p _ 2 . Moreover, by Lemma 1, if si is determined, then there are only at most 
five possibilities for the value of s 2 . 

Even in the case q = p and g = 2, it is difficult in general to calculate .s, 
(mod p) by using Lemma 2 when p is very large. But for hyperelliptic curves 
of special type, it is possible to calculate them in a remarkably short time even 
when p is extremely large, e.g. 160-bit. 

Here we consider hyperelliptic curves of type y 2 = x 5 + ax, a £ ¥ p . We show 
the following theorem which is essential to construct our algorithm. 

Theorem 3. Let a be an element of F p , C a hyperelliptic curve defined by the 
equation y 2 = x 5 + ax and x P (f) the characteristic polynomial of the p-th power 
Frobenius endomorphism ofC. Then s i, s 2 in Xp(f) are given as follows. 

1. ifp= 1 (mod 8), then 

Sl = (— 1)(p-!)/ 8 2 c(a 3(p " 1)/8 + a (p " 1)/8 ) (mod p), 
s 2 = 4 c 2 a ( ' p ^ 1 ^ 2 (mod p) 

where c is an integer such that p = c 2 + 2d 2 , c = 1 (mod 4) and d £ Z. 

2. if p = 3 (mod 8), then si = 0 (mod p) and s 2 = — 4c 2 a^ p_1 ^ 2 (mod p) 
where c is an integer such that p = c 2 + 2d 2 and d £ Z. 

3. Otherwise, si = 0 (mod p) and s 2 = 0 (mod p). 

Proof. Since (x 5 + ax)^~ = iffif~)x' Lr+ ^ p ~ l ^ 2 a^ p ~ 1 ' > / 2 ~ r , the necessary 

condition for an entry of the Hasse-Witt matrix A = ( tp ~ 1 Cp ~ 2 ) of C 
J P 1 \C2p-lC2p-2j 

being non-zero is that there must be an integer r, 0 < r < (p — l)/2 such that 

4r + (p — 1) /2 = ip — j. Then there are the following three possibilities: (i) 

^ = (% _1 c 2p °_ 2 ) if P ~ 1 (m0d 8) ’ (ii) A = (c 2 p_r %" 2 ) if p ~ 3 (m0d 8) ’ 
(iii) A = O ifp^ 1,3 (mod 8). 



30 



Eisaku Furukawa et al. 



Case (i). Put / = (p— 1)/8. Then, since 4r+(p— 1)/2 = p— 1 for c p _ i, we have 
r = (p — l)/8 = / and c p _i = ( 4 J)a 3 f . For C2 P -2, since 4r + (p — l)/2 = 2p — 2, 
we have r = 3(p — l)/8 = 3/ and C2 P -2 = (3/)°^- From the result of Hudson- 
Williams [10, Theorem 11.2], we have = (— l)^2c (mod p) where p = c 2 +2d 2 
and c = 1 (mod 4). Since ( 4 J) = (3^), we have the case (1). 

Case (ii). By the condition, it is obvious that si = 0 (mod p). Put / = 
(p — 3) /8. Then, since 4r + (p — l)/2 = p — 2 for c p _ 2, we have r = (p — 3)/8 = / 
and Cp —2 = ( 4 ^ l_1 )a 3 ^ +1 . For C2 P -i, since 4r + (p — l)/2 = 2p — 1, we have 
r = (3 p — l)/8 = 3/ + 1 and C2 P -i = (3^+1)^. From the result of Berndt- 
Evans-Williams [1, Theorem 12.9.7], ( 4 ^" 1 ) = —2 c (mod p) where p = c 2 + 2d 2 
and c = (— 1)-^ (mod 4). Since (g£J^) = ( 4 '^ l ~ 1 ), we have S2 = ^( 4 ^ 1 ) 2 a 4 ^ +1 = 
— 4c 2 a^ p_1 ^ 2 (mod p). Thus we obtain the case (2). 

Case (iii). This is obvious and we obtain the case (3). □ 

Remark 2. Note that the order of Jc(F p ) for a curve of type y 2 = x 5 + ax is 
always even because Jc(F p ) has a point of order 2. By Lemma 1, if p > 64, then 
there are only at most three possibilities for the value of S 2 . 

By using Theorem 3 and Remark 2, we can calculate (at most three) possibilities 
of jlJc(F p ) in a very short time. Then to determine (jJc , (®’ p ), we only have to 
multiply a random point on Jc(F p ) by each possible order. The following remark 
is also important. 

Remark 3. If p > 16 for the case (2) and (3) in Theorem 3, we have si = 0. 

4 Study on the Structure of the Jacobian Group 

Before describing our point counting algorithm, we study the structure of the 
Jacobian group for y 2 = x 5 + ax more precisely. First, we study the reducibility 
of the Jacobian variety over extension fields of the defining field F p . Second, we 
determine the characteristic polynomial of the p-th power Frobenius endomor- 
phism for many cases and give a necessary condition to be suitable for HCC 
explicitly. 

4.1 Reducibility of the Jacobian Variety 

We recall a few basic facts on the relation between the reducibility of the Jacobian 
variety and the characteristic polynomial of the Frobenius endomorphism. The 
following famous result was proved by Tate [18]: 

Theorem 4. Let A\, A 2 be abelian varieties over F g and Xi(t), X 2 (t) character- 
istic polynomials of q-th power Frobenius endomorphisms of A\, A 2 , respectively. 
Then, A\ is isogenous to A 2 over F 9 if and only if yi(f) = Xzif)- 
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The characteristic polynomial of the g-th power Frobenius endomorphism for 
a simple abelian variety of dimension two over F g is determined as follows: 

Theorem 5 ([21], cf. [17, 19]). All possible characteristic polynomials Xq(t) 
of q-th power Frobenius endomorphisms for simple abelian varieties of dimension 
two over F g = F p r are the followings: 

1. Xq(t) = — Sit 3 + S 2 t 2 — qs\t + q 2 is irreducible in Z[t] , where si , S 2 satisfy 

some basic conditions, 

2. Xq(t) = (t 2 — q) 2 , r is odd, 

3. Xq(t) = (t 2 + q) 2 > r is even and p = 1 (mod 4), 

4- Xq(t) = (t 2 ± q 1 i 2 t + q) 2 , r is even and p= 1 (mod 3). 

For the reducibility of Xq 2 {t), the following lemma holds: 

Lemma 3. Let C be a hyperelliptic curve over F g and x«?(t) = t A — sit 3 + S 2 t 2 — 
qs\t + q 2 the characteristic polynomial of q-th power Frobenius endomorphism 
ofC, Xq 2 if) the one of q 2 -th power Frobenius endomorphism. Assume that Xq(t) 
is irreducible in Z[t\. Then Xq 2 (f) is reducible in Z[t\ if and only if s i = 0. 

Proof. Let a, d, (3, (3 be four roots of x g (f) where " means complex conjugate. 
Then it is a well-known fact that Xq 2 (t) = (t — a 2 )(t — d 2 )(f — (3 2 )(t — ft 2 ). 

Assume that s \ = 0. Put lo\ = a + a and u >2 = f3 + f3. Then from si = 0 
and S 2 S Z, we have u)\ + u >2 = 0 and UJ 1 UJ 2 + 2q £ Z. Put m = UJ 1 LO 2 + 2 q. 
Then a 2 + a 2 = d 2 — 2q = —m. We also have (3 2 + p 2 = —m. Hence we have 

Xq 2 (t) = (t 2 + mt + q 2 ) 2 . 

Assume that Xq 2 {t) is n °t irreducible over Z. First we consider the case 
Xq 2 (t) factors into a product of two polynomials of degree 2 over Z. In this 
case, there are two possibilities: (a) (t — cr 2 )(t — d 2 ), (t — (3 2 ){t — f3 2 ) £ Z[t\, (b) 
it — a 2 )(t — (3 2 ), (t — a 2 )(t — (3 2 ) £ Z[t\. In case (a), (a + d) 2 = a 2 + d 2 + 2q £ Z. 
We also have (/? + ft) 2 £ Z. Since x 9 (t) is irreducible over Z, a + a and /3 + (3 are 
irrational numbers and we obtain that si = (a + d) + {(3 + (3) must be zero. In 
case (b), since a 2 + /3 2 , d 2 +/3 2 , a 2 (3 2 , a 2 j3 2 are all in Z, we have a 2 + /3 2 = a 2 +(3 2 
and a 2 (3 2 = d 2 /3 2 . Then a 2 = a 2 or a 2 = (3 2 . Since Xq(t) is irreducible, it cannot 
have a double root. So we have q = -a or a = — /3. Moreover a = — a does not 
occur because if a = —a then Xq(t) has a factor (t — a)(t — d) = t 2 + q over Z. 
Hence we obtain a = —(3. Then a+(3 = 0 and we have Si = (a+j3) + (a+/3) = 0. 
Finally, we consider the case that Xq 2 (t) has a factor of degree 1 over Z. But if 
t — a 2 £ Z[t\ then we obtain a 2 = a 2 . As we showed in case (b), it does not 
occur. □ 

Now we consider the reducibility for the Jacobian variety of our curve y 2 = 
x 5 + ax. 

Lemma 4. Let p be an odd prime and C a hyperelliptic curve defined by y 2 = 
x 5 + ax, a £ Fp and F g = F p r, r > 1. If a 1 ' 4 £ F g , then Jc is isogenous to the 
product of the following two elliptic curves E\ and E 2 over F g : 

Ei : Y 2 = X{X 2 + 4a 1/4 X -2a 1/2 ), 

E 2 : Y 2 = X(X 2 ~ 4a 1/4 X -2a 1/2 ). 
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Proof. Let a be an element of F g such that a 4 = a. We can construct maps 
(fi : C — > Ei explicitly as follows: ip*{X) = (x — (— 1 ) l a) 2 /x,ip*{Y) = (x — 
(— 1 ) l a)y/x 2 , * = 1,2. Since pull-backs of regular 1-forms dX/Y on Ef s generate 
the space of regular 1-forms on C, tpi x ip 2 induces an isogeny from Jq to E\ x E 2 
(cf. [13, 12]). □ 

The Jacobian variety for curves of type y 2 = x 5 + ax is reducible over F p 4 by 
the above lemma. From a cryptographic point of view, if the Jacobian variety 
splits over an extension field of degree two, HCC for these curves might lose its 
advantage to ECC. Hence in the following, it is important to see whether the 
Jacobian splits over an extension field F p r of lower degree, i.e. r = 1, 2. 

Remark f. If F 9 includes a 4-th primitive root of unity, E\ and E 2 in Lemma 4 
are isomorphic to each other by the following transformation: X — ► — X, Y —> 
QfiY where £4 is a 4-th primitive root of unity in F g . 



4.2 Determining the Characteristic Polynomial 
of the p-th Power Frobenius Endomorphism 

Due to Theorem 3, we divide the situation into the following three cases: 

(1) p = 1 (mod 8), (2) p = 3 (mod 8), (3) p = 5, 7 (mod 8). 

The Case of p = 1 (mod 8). 

Lemma 5. Let p be a prime number such that p = 1 (mod 8) and C a hyperel- 
liptic curve over F p defined by an equation y 2 = x 5 + ax. If a^ p ~ 1 " 2 = 1, then f 
divides jjJc(F p ). Moreover, */a^ p_1 ^ 4 = 1, then 16 divides 

Proof. First note that there is a primitive 8-th root of unity, Cs, in F p because 8 
divides p— 1. If a^ p ~ 1 ^ 2 = 1, then there exists an element b £ F p such that b 2 = a. 
Then 

x 5 + ax = x 5 + b 2 x = x(x 2 + fgb)(x 2 — (g b). 

It is easy to see that (a;,0) and (x 2 + fgb, 0), which are points on Jc{ F p ) in 
the Mumforcl representation, generate a subgroup of order 4 in Jc( F p ). Hence 4 
divides ftJc(F p ). 

If a( p-1 )/ 4 = 1, there is an element u in F p such that a = it 4 . Then 

x 5 + ax = x 5 + u 4 x = x{x + C,8 u )(x — Csu)(x + £fu)(x ~ d u )- 

It is easy to see that (x, 0), {x + (gu,0), (x — Csm, 0) and (x + (gU, 0) generate 
a subgroup of order 16 in Jc( F p ). Hence 16 divides HJc(F p ). □ 

Theorem 6. Let p be a prime number such that p > 64, p = 1 (mod 8) and C 
a hyperelliptic curve over F p defined by an equation y 2 = x 5 + ax. If = 1, 
then Xp(f) are as follows: 
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1. if p = 1 (mod 16) and a^ p_1 ^ 8 = 1, then x P (f) = (t 2 ~ ‘Zct + p) 2 , 

2. if p = 9 (mod 16) and a^ p_1 ^ 8 = 1, then % p (i) = (f 2 + 2 ct + p) 2 , 

3. if p = 1 (mod 16) and a( p_1 ^ 8 = — 1, then Xp(t) = ( t 2 + 2 ct + p) 2 , 

4 ■ if p = 9 (mod 16) and a^ p_1 ^ 8 = —1, then x p (t) = (t 2 ~ 2cf + p) 2 , 

5. otherwise, x P (f) = t 4 + (4c 2 — 2 p)t 2 + p 2 , 

where p = c 2 + 2d 2 , c, d G Z and c = 1 (mod 4) . 

Proof. First of all, from Theorem 3, si = (— l)( p_1 )/ s 2c (a 3< - p_1 )/ 8 + a^ -1 )/ 8 ) 
(mod p) and s i = 4c 2 (mod p) for all cases. 

For the case (1), from Theorem 3 we have si = 4c (mod p). By the definition 
of c, c 2 < p and hence 0 < |4c| < 4 ^fp. Since p > 64 and Remark 1, we 
have that Si = 4c. Moreover since [2^/plsi | — 2 p] < s 2 < [sf /4 + 2p\ and 
0 < 4c 2 < 4 p, s 2 is of the form 4c 2 + mp , — 5 < m < 2, m € Z. Then ttJc(F p ) = 
1+p 2 — 4c(l+p)+4c 2 + mp where m is an integer such that — 5 < m < 2. Since 
j)Jc(Fp) = 0 (mod 16) from Lemma 5, 1+p 2 — 4c(l+p)+4c 2 +mp = 0 (mod 16). 
Since p = 1 (mod 16) and c = 1 (mod 4), we have mp = 2 (mod 16) and then 
m = 2. Hence we obtain x P (t) = t 4 — 4cf 3 +(4c 2 +2p)f 2 — 4cpt+p 2 = (t 2 — 2ct+p) 2 . 
For the cases (2), (3), (4), we can show in the same way. 

For the case (5), a^ p_1 ^ 8 is a primitive 4-th root of unity and a 3 ^ p_1 ^ 8 + 
a (p- 1)/8 = o. So we have that Si = 0 by Theorem 3 and p > 64. Since |s 2 1 < 2 p 
in this case by Lemma 1 and 0 < 4c 2 < 4p by the definition of c, S 2 is of the form 
4c 2 + mp, — 5 < to < 1, m e Z. On the other hand, since 1 + p 2 = 2 (mod 4) 
and Sdc(F 0 ) = 0 (mod 4) by Lemma 5, we have that to = —2. Hence we obtain 
Xq (t) =t i + {Ac 2 -2p)t 2 +p 2 . □ 

Hence in particular if p = 1 (mod 8) and = 1, then C with a^ p_1 ^ 4 = 1 

is not suitable for HCC because j)Jc(F p ) = (p ± 2c + l) 2 and |c| < yfp. In 
addition, Jc in case (5) is isogenous to the product of two elliptic curves over 
F p 2 because a 1 / 4 G F p 2 . 



The Case of p = 3 (mod 8). 

Lemma 6. For a hyperelliptic curve C : y 2 = x 5 + ax, a G F p where p = 3 
(mod 4), the followings hold: 



!■ if (f) = 1, then ))Jc(F p ) = 0 (mod 4), 

2- if = -1, then (t^c(F p ) = 0 (mod 8). 

Proof. If = 1, then there exists an element b G F p such that a = b 2 . Since 
^^2.^ = — 1 by p = 3 (mod 4), either 2b or — 2b is a square. If 2b = u 2 , then 

x 5 + ax = x{{x 2 + b) 2 — 2 bx 2 } = x(x 2 + ux + b)(x 2 — ux + b) 
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over F p and (x, 0) and ( x 2 + ux + b, 0) generate a subgroup of order 4 in Jq( F p ). 
If —2b = u 2 , 

x 5 + ax = x{(x 2 — b) 2 — (— 2&)x 2 } = x(x 2 + ux — b)(x 2 — ux — b) 

over F p and (x, 0) and (x 2 + ux — b, 0) generate a subgroup of order 4 in Jq( F p ). 

If = —1, then x 5 + ax factors into a form x(x + (3)(x — (3)(x 2 + 7) over 
Fp. It is easy to see that ( x , 0), {x + (3 , 0) and (x — /?, 0) generate a subgroup of 
order 8 in Jc(Fp). □ 

Theorem 7. Let p be a prime number such that p > 16, p = 3 (mod 8) and C 
a hyperelliptic curve over F p defined by the equation y 2 = x 5 + ax. If = 1, 
then Xp(t) = (t 2 + 2ci + p)(t 2 — 2 ct + p) where p = c 2 + 2d 2 , c,d G Z. 

Proof. The order of Jc{ F p ) is given by 1 +p 2 + s 2 because si = 0. Moreover s 2 = 
—4 c 2 a^ p ~ 1 ^ 2 = —4c 2 (mod p). Since | S2 1 < 2 p, S 2 = —4c 2 + mp where m £ Z 
such that —2 p < —4c 2 + mp < 2 p. By the definition of c, 0 < c 2 < p and 
—4 p < —4c 2 < 0. Hence we have — 1 < m < 5. 

On the other hand, since 4 divides ftJc(Fp) by Lemma 6, we have (1 + p 2 + 
mp — 4c 2 ) = 0 (mod 4). By p = 3 (mod 8) and c 2 = 1 (mod 4), we have the 
condition 1 +p 2 + mp — 4c 2 = 2 + 3m = 0 (mod 4) and we obtain m = 2. Hence 
Xp(t) =t 4 + (2 p - 4c 2 )t 2 + p 2 = ( t 2 + 2 ct + p)(t 2 - 2 ct + p). □ 

Theorem 8. Let p be a prime number such that p > 16, p = 3 (mod 8) and C 
a hyperelliptic curve over F p defined by the equation y 2 — x 5 + ax. If = — 1, 
then Xp(t) = t 4 + (4c 2 — 2 p)t 2 + p 2 where p = c 2 + 2d 2 , c,d G Z. 

Proof. In this case, jjJc(Fp) = l+p 2 + mp+4c 2 where —2 p < mp + 4c 2 < 2 p and 
— 5 < m < 1. Since 8 divides ft Jc(F p ) by Lemma 6, l+p 2 + mp+4c 2 = 6+3m = 0 
(mod 8) and we obtain m = —2. Hence Xp(f) = t 4 + (4c 2 — 2 p)t 2 +p 2 . □ 

Hence in this case, jj Jc(F p ) only depends on p and the value of the Legendre 
symbol for (^j- And in particular, C is not suitable for HCC if ^''j = 1 
because HJc(Fp) = (p + 2c + l)(p — 2c + 1) and |c| < p . In addition, Jc for 
the case of = — 1 is isogenous to the product of two elliptic curves over F p 2 
because a 1 / 4 g F p 2. 

The Case of p = 5, 7 (mod 8). This is the case that the Jacobian variety Jc 
is supersingular because si = S2 = 0 (mod p) (cf. [21]). 

Lemma 7. Let p be a prime number such that p > 16 and p = 5 (mod 8). For 
a hyperelliptic curve C : y 2 = x 5 + ax, a S F p , the followings hold: 

1. if a( p-1 )/ 4 = 1, then HJc^p) = 0 (mod 4), 

2. i/a( p-1 )/ 4 = —1, then j)Jc(Fp) = 0 (mod 8). 
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Proof. Note that F p has a 4-th primitive root of unity, £4, because p — 1 = 0 
(mod 4). Since = 1 in both cases, there exists an element b £ F p such that 

a = b 2 and x 5 + ax = x(x 2 + ( 4 b) (x 2 — £4 b). Hence (x, 0), (x 2 + ( 4 b, 0) generates 
a subgroup of order 4 in Jc(F p ). 

If a( p-1 )/ 4 = — 1, = —1 and then x 2 — ( ' 4 b factors into the form (x + 

(3)(x — (3) because = —1. Hence in case (2), (x, 0), (x + /?, 0), (x — /?, 0) 
generate a subgroup of order 8 in Jc(F p ). □ 

Using the above lemma and Lemma 6, we obtain the following theorem. 

Theorem 9. Let p be a prime number such that p > 16, p = 5, 7 (mod 8) 
and C a hyperelliptic curve over F p defined by the equation y 2 = x 5 + ax. Then, 

1. if p = 5 (mod 8) and a( p_1 i/ 4 = 1, then Xp(t) = ( t 2 +p) 2 , 

2. if p = 5 (mod 8) and a^ p_1 ^ 4 = —1, then x P (f) = ( t 2 — p) 2 , 

3. if p = 5 (mod 8) and = —1, then Xp(J) = + p 2 , 

f. if p = 7 (mod 8), then Xp(t) = ( t 2 + p ) 2 . 

Proof. The order of Jc{ F p ) is given by l+p 2 + s 2 because Si = 0. Moreover, s 2 = 
0 or ±2 p by Lemma 1 and Remark 2. Note that 1 + p 2 = 2 (mod 8). 

In case (1), a 1 / 4 £ F p . Then Jc is isogenous to the product of two elliptic 
curves over F p by Lemma 4. Hence by the list of Theorem 5, x P (f) must be 

C t 2 +P ) 2 - 

In case (2), jjJc(F p ) = 0 (mod 8) by Lemma 7. Then we obtain s 2 = —2 p 
and the result. 

Incase (3), we use the relation s 2 = (M 2 — 1— p 2 +s \)/2 where M 2 = tJC(F p 2). 
Since si = 0, s 2 = (M 2 — 1 — p 2 )/ 2 and M 2 is given by 1 + jji? + 2jJS where 
R = {x £ F p 2|x 5 + ax = 0} and S = {x £ F p 2|x 5 + ax is a non-zero square }. 
Since F p 2 has a primitive 8-th root of unity, (s, w e easily see that if u £ S then 
(g u £ S. Hence we have that 4 divides )J5. In the case of p = 5 (mod 8) and 

= —1, t)i? = 1 and we have M 2 = 2 (mod 8). Hence in this case, s 2 = 0 
(mod 4) and we have that s 2 = 0. 

In case (4), we divide the situation by the value of the Legendre symbol (j.'j . 

If (^j = 1 then a 1//4 £ F p because = — 1- By this fact, if = 1 then Jc 

is isogenous to the product of two elliptic curves over F p and we obtain the result 
as in case (1). For the case of = —1, we have s 2 = 2 p by Lemma 6. □ 

So in this case, C is not suitable for HCC if p = 5 (mod 8) with = 1 or 

p = 7 (mod 8), because jJJc(F p ) = (p ± l) 2 . If p = 5 (mod 8) and = — 1, 
X P 2 (t) is split because si = 0 but Jc is simple over F p 2 by Theorem 5. 
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4.3 Necessary Condition to be Suitable for HCC 

From the results in 4.2, we have the following corollary. 

Corollary 1. Let p be a prime number and C a hyperelliptic curve defined by 
an equation y 2 = x 5 + ax where a G F p . Then C is not suitable for HCC if one 
of the followings holds: (1) p = 1 (mod 8), a^ p_1 ^ 4 = 1, (2) p = 3 (mod 8), 
(*) = 1, (3) P = 5 (mod 8), (a) = 1, (4) P = 7 (mod 8). 

In addition to the above cases, if p = 1 (mod 8) with a^ p ~ l ^ A = — 1 or p = 3 
(mod 8) with = — 1 , j c j s isogenous to the product of two elliptic curves 
over F p 2 . 

5 Point Counting Algorithm 

and Searching Suitable Curves 

In this section we search suitable curves for HCC among hyperelliptic curves 
of type y 2 = x 5 + ax, a € F p . From the result of the previous section, all the 
cases which can have suitable orders are the followings: (1) p = 1 (mod 8) with 

= -1, (2) p = 1 (mod 8) with = -1, (3) p = 3 (mod 8) with 

= _i ) (4) p = 5 (mod 8) with = —1. But as we remarked in 4.2 

and 4.3, Jc’ s are reducible over F p 2 in the case (2) and (3). Moreover Jc is 
supersingular in the case (4) as we remarked in 4.2. Hence we exclude these 

cases and only focus on the remaining case (1): p = 1 (mod 8) with = —1. 

On the other hand, the Jacobian group Jc^p) for our curve has a 2-torsion 
point (Remark 2), the best possible order of Jc(Fp) is 2 1 where l is prime. The 
case (1) in the above is the case that we can obtain the best possible order. 

For the case (1) we cannot determine the characteristic polynomial of the p- 
th power Frobenius endomorphism by using the same method in 4.2. So we need 
a point counting algorithm for Jc(F p ). First we describe our algorithm and next 
we show the result of the search based on our algorithm. 



5.1 Point Counting Algorithm for p = 1 (mod 8) and = —1 

We describe our algorithm based on Theorem 3. The algorithm is as follows: 

Algorithm 1 

Input : a g¥ p where p = 1 (mod 8) and p > 64 

Output; f)Jc(Fp) (C : a hyperelliptic curve of genus 2 defined by y 2 = x 5 + ax) 

1. Calculate an integer c such that p = c 2 + 2d 2 , c = 1 (mod 4), dgZ by using 

Comacchia’s Algorithm. 
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2. Determine s 

s <- (-l)(P- 1 )/ 8 2c(a 3 ( p " 1 )/ 8 + a^" 1 )/ 8 ) (mod p ) (0 < s < p - 1) 

If s < 4 yjp, then si <— s, else si <— s — p. 

3. Determine the list S of candidates for S 2 -' 

t <— 4 c 2 a( p-1 )/ 2 (mod p) (0 < t < p — 1) 

If t: even, then S <— (t + 2 mp | 2^fp\s\\ — 2p <t + 2 mp < s\/ 4 + 2p}, 
else 5 <— {t + (2m + l)p | 2^/p\si\ —2 p < t+ (2m + 1 )p < s\/A + 2p}. 
Determine the list L of candidates for U Jc(F p ): 

L <— {l + p 2 — si(p + 1) + S 2 | S 2 C 5 1 } . (jt L <3 by Remark 2.) 

5. If fti = 1, then return the unique element of L, 

else determine jt </c(F p ) by multiplying a random point D on Jc(V p ) by each 
element of L. 

It is easy to show that the expected running time of the above algorithm is 
0(ln 4 p). (For an estimation for Cornacchia’s algorithm and so on, see Cohen’s 
book [5] for example.) 

5.2 Searching Suitable Curves for HCC and Results 

Here we show the result that we have searched hyperelliptic curves suitable for 
HCC among hyperelliptic curves of type y 2 = x 5 + ax , a £ F p . 

Our search is based on the algorithm which we proposed in 5.1. All compu- 
tation below were done by Mathematica 4.1 on Celeron 600MHz. 

Example 1. The followings are examples of curves such that the orders of their 
Jacobian groups are in the form 2 -(prime). 

p = 1208925819614629175095961 (81-bit), a = 3, 

Jc(Fp) = 2 • 730750818666480869498570026461293846666412451841 (160-bit) 
(The computation for counting points took 0.04s.) 

p=2923003274661805836407369665432566039311865180529(162-bit), a=371293, 
J c (F p )=2-42719740718418201647900421592006690578364140623317241379335\ 
65193825968686576267080087081984838097(321-bit) 

(The computation for counting points took 0.07s.) 

In the above examples, Jc s are simple over F p 2 . Since jj Jc(F p ) has a large prime 
factor, the characteristic polynomial of the p- th power Frobenius endomorphism 
must be irreducible. Moreover since s i ^ 0 over F p , the characteristic polynomial 
of p 2 -th power Frobenius endomorphism cannot split by Lemma 3. 

Furthermore, one can easily check that large prime factors of the above 
ti^c(Fp) do not divide p r — 1, r = 1, 2, . . . , 2 3 [log 2 p\ . Hence these curves are 
not weak against the Frey- Ruck attack [7]. 
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Example 2. The following table shows the result of search in many p’ s. We can 
find the following number of suitable curves for each search range. 



search range (r, s) 
for p, r < p < s 


nnm. of primes 
p = 1 (mod 8) 


num. of curves s.t. 
)(Jc(F p ) = 2 x (prime) 


time 

(seconds) 


2 8U ,2 8U + 10 B 


4441 


366 


416.67 


2 S1 ,2 81 + 10 B 


4309 


352 


409.72 


2 lf3i ,2 ifcii + 10 B 


2276 


93 


497.49 


2 325 ,2 325 + 10 e 


1100 


30 


731.52 



Remark 5. From the result of Duursma, Gaudry and Morain [6], an automor- 
phism of large order can be exploited to accelerate the Pollard’s rho algorithm. 
If there is an automorphism of order m , we can get a speed up of y/m. The order 
of any automorphism of y 2 — x b + ax is at most 8. So the Pollard’s rho algorithm 
for these curves can be improved only by a factor -\/8- 

6 Point Counting Algorithm 

for another Curve: y 2 — x 5 + a 

In this section, we consider another curve y 2 = x 5 + a, a € F p . For the case a is 
square in F p , it is a kind of Buhler-Koblitz’s curves [2]. Hence we consider the 
case a is non-square. 

Theorem 10. Let p be an odd prime number such that p = 1 (mod 5 ). C a hy- 
perelliptic curve defined by the equation y 2 = x 5 + a where a £ F p . Moreover 
let x( s )x(l ~ s ) fre the Jacobi sum for a character x of F p 

which maps a fixed non-quintic element in F p to £ = e 27 ”/ 5 and ci, 02,03,04 be 
coefficients of £* in the expression Js(x,x) = ci£ + C2C 2 + C3C 3 + c^. Then 
for the characteristic polynomial t A — s\t 3 + S 2 t 2 — s\pt + p 2 of the p-th power 
Frobenius endomorphism ofC, s i, S2 are given as follows: 

Sl = ia 3 {-z + (3) a ^ p ~ l)/w + X -a {-z - p) a (p " 1)/10 (mod p) 

s 2 = ia 4 (z 2 - p 2 ) a 2(p ~ 1)/5 (mod p) 

where a = (mod p), p = ^ an< ^ z i u i v , w are given by z = 

— (ci + C2 + 03 + 04), 5 m = or + 2 c 2 — 2 c 3 — 04, 5 m = 2 ci — c 2 + c 3 — 204, 
5 m; = Or - 02 — c 3 + 04. 

Proof. Since (a; 5 + a)^ p_1 ^ 2 = Y^}r=o^ 2 ( E ^~)x 5r a ( ' p ~ 1 ^ 2 ~ r and p = 1 (mod 5), 
the Hasse-Witt matrix of C is of the form l^Z 1 ^ ) . Put / = (p — 1) /10. 

\ 0 C2p-2 ) 

Then si = + (4/)°^ (mod p) and S2 = (2/)(4/) a4 ^ (mod p). From 

the result of [10, Theorem 13.1], ( ) = \o? (-£ + and Cj) = 

i a (—z — w \ z , ~ 125w A . Hence we obtain the result. □ 

2 V 4 (zw+uv) J 
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Remark 6. Ifp^l (mod 5), then the Hasse-Witt matrix is of the form 
or ( 0 ^ ) . Hence Si = S 2 = 0 (mod p ) and Jc is supersingular [211. 

\C2p-l U ) 

From the above theorem, we obtain a point counting algorithm for curves of 
type y 2 = x 5 + a over F p when p = 1 (mod 5). The algorithm is as follows: 

Algorithm 2 

Input: a £ F p where p = 1 (mod 5) and p > 64. 

Output: f)Jc(Fp) (C: a hyperelliptic curve of genus 2 defined by y 2 = x 5 + a). 

1. Calculate coefficients ci, 02 , 03,04 in Js(X) X) = Y^t=i c iC Theorem 10 by 
using the LLL algorithm. (See [2] for details.) 

2. Determine s 1 by Theorem 10 and the bound |si| < 4y/p. 

3. Determine the list of candidates for S 2 by Theorem 10 and Lemma 1. 

4- Determine the list L of candidates for jj Jc(F p ) from results of Step 2 and 3. 
(jj L < 5 by Remark 1.) 

5. If jj L = 1, then return the unique element of L, else determine jJJc(F p ) 
by multiplying a random point D on Jc(F p ) by each element of L. 

We show the result that we have searched suitable curves for HCC among hyper- 
elliptic curves of type y 2 = x 5 + a, a £ F p where a is non-square. All computation 
below were done by Mathematica 4.1 on Celeron 600MHz. 

Example 3. The followings are examples of curves whose Jacobian groups have 
prime orders. 

p = 1208925819614629174708801 (81-bit), a = 1331, 

Jc(Fp) = 1461501637326815988079848163961117521046955445901(160-bit) 
(The computation for counting points took 0.18s.) 

p = 1208925819614629174709941 (81-bit), a = 2, 

Jc(Fp) = 1461501637331762771847359428275278989652932675771 (161-bit) 
(The computation for counting points took 24.58s.) 

In these examples, Jc’s are simple over F p 2 by Lemma 3 and not weak against 
the Frey-Ruck attack. 

Example 4- The following table shows the result of search in many p’s. We can 
find the following number of suitable curves for each search range. 



search range (r, s ) 
for p, r < p < s 


num. of primes 
p = 1 (mod 5) 


num. of curves s.t. 
jjTc(F p ) =prime 


time 

(seconds) 


2 su, 2 «o + 10 4 


50 


7 


237.67 


2 81 , 2 81 + 10 4 


40 


7 


224.16 


2 sz, + 10 4 


39 


5 


297.13 


2 1 UU, 2 !UU + 10 4 


33 


5 


335.76 
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Remark 7. The order of any automorphism of y 2 = x 5 + a is at most 10. So as 
same as we remarked in Remark 5, the Pollard’s rho algorithm for these curves 
can be improved only by a factor VTO- 
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Abstract. Increasing key length is a standard counter-measure to crypt- 
analysis. However, longer key length generally means greater side chan- 
nel leakage. For embedded RSA crypto-systems the increase in leaked 
data outstrips the increase in secret data so that, in contrast to the im- 
proved mathematical strength, longer keys may, in fact, lead to lower 
security. This is investigated for two types of implementation attack. 
The first is a timing attack in which squares and multiplications are 
differentiated from the relative frequencies of conditional subtractions 
over several exponentiations. Once keys are large enough, longer length 
seems to decrease security. The second case is a power analysis attack on 
a single m - ary exponentiation using a single fc-bit hardware multiplier. 
For this, despite certain counter-measures such as exponent blinding, un- 
certainty in determining the secret bits decreases so quickly that longer 
keys appear to be noticeably less secure. 

Keywords: RSA Cryptosystem, Key Length, Side Channel Attacks, 
Timing Attack, Power Analysis, DPA. 



1 Introduction 

So-called side channel attacks on smartcards to discover secret keys contained 
therein follow a well-established tradition pursued by the military and secret 
services, and exemplified by the long-running Tempest project of the US [27]. 
That project concentrated on detecting and obscuring electro-magnetic radiation 
(EMR) and led to both heavily shielded monitors (for those based on electron 
beam technology) and TV detector vans. EMR can be, and is, used to break 
smartcards - but with a somewhat smaller aerial, one some 3mm long or less [5]. 
If correctly placed and set up with sufficiently sensitive equipment, these can 
detect useful variations in EMR non-invasively. 

Side-channel leakage occurs through data dependent variation in the use of re- 
sources such as time and hardware. The former results from branching in the code 
or compiler optimisation [2, 9], and the latter manifests itself through current 
variation as well as EMR [10, 12, 13]. For the RSA crypto-system [17], conditional 
modular subtractions should be removed to make the time constant [18, 19, 21] . 
Bus activity is the major cause of power variation, with a strong relationship 
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between it and the Hamming weight of the data on the bus. Instructions and 
memory locations pass along the bus and, in the context of the limited compu- 
tational resources of a smartcard, potentially also large quantities of data. This 
is partly solved by encryption of the bus [1, 11]. 

For all the popular crypto-systems used in practice where the key length is 
variable, the greater the key length, the greater the mathematical strength of the 
system against attack is believed to be. Indeed, a brute force attack will take 
time exponential in the key length. However, longer key lengths require more 
computation for encryption and decryption or signature and verification. Hence 
there is more data which leaks through timing, current and EMR variation. In an 
embedded crypto-system to which an attacker has access, such as a smartcard, 
a valid question to ask is whether or not the increased data from side channel 
leakage actually makes longer keys more vulnerable to attack. 

In the symmetric crypto-systems of DES, 3-DES and AES [25, 26], the block 
length is fixed and the number of rounds is proportional to the key length (com- 
paring DES with 3-DES, and AES with different choices for its parameter Nk). 
Hence the data leakage is also proportional to the key length and the implement- 
ation strength of the cipher is unlikely to decrease as key length increases. 

However, public key cryptography such as in RSA, DSA, ECC, Diffie-Hellman 
or El-Gamal [3, 4, 14, 17, 24], usually involves exponentiation in some form, 
where the block length and exponent are proportional to the key length. As- 
suming multiplication of double-length arguments takes four times the time for 
single-length arguments on the same hardware, total decryption/signing time is 
proportional to the cube of the key length. Consequently, more leaked data is 
available per key bit as key length grows. Indeed, if the multiplicative operations 
of the exponentiation are performed sequentially using one of the standard al- 
gorithms [7, 8] and no form of blinding, then there is more data per exponent 
bit for longer key lengths and one should expect the implementation strength to 
decrease. 

Two attacks illustrate that such an outcome may be possible from increas- 
ing the key length. The first is a timing attack [21] whose success is apparently 
easier for very short and very long key lengths. By its very nature, the attack 
assumes essentially none of the currently expected standard counter-measures 
which would ensure no data-dependent time variations and would introduce ran- 
dom variation in the exponent. Each key bit is determined by data proportional 
to the square of the key length. 

The second attack [20] requires more expensive monitoring equipment to 
perform and uses power analysis and/or EMR. It is applied to a single expo- 
nentiation and so avoids any difficulty arising from employing exponent blinding 
as a counter-measure. Key bits are determined independently with each using 
all available data, that is, with data proportional to the cube of the key length. 
Individual exponent bits are now identified correctly with an accuracy which in- 
creases sufficiently quickly to compensate for the increased number of key bits. 
Consequently the attack becomes easier with longer key lengths. 
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Having assessed the vulnerabilities, our conclusion is that indeed increased 
key length will normally lead to a weaker implementations unless appropriate 
counter-measures are taken. As this is counter-intuitive, it achieves the main aim 
of the paper, namely to provide the justification for cautioning readers strongly 
against the temptation to assume that counter-measures to cryptanalysis can be 
used successfully as counter-measures to side channel leakage. 

2 Security Model 

The contexts for the two attacks [20, 21] are slightly different, but, for con- 
venience, in both cases we assume a similar, but realistic, scenario. For each, 
a smartcard is performing RSA with limited resources and must be re-usable 
after the attack. The attacker is therefore limited in what he is allowed to do: he 
can only monitor side channel leakage. He cannot choose any inputs, nor can he 
read inputs or outputs. In most well-designed crypto-systems, the I/O will be 
blinded and the attacker will be able to see at most the unblinded data which is 
not used directly in the observed computations. However, the attacker is allowed 
to know the algorithms involved, perhaps as a result of previous destructive stud- 
ies of identical cards, insider information and public specifications. His goal is to 
determine the secret exponent D whether or not the Chinese Remainder The- 
orem has been used, and he may use knowledge of the public modulus M and 
public exponent E to confirm proposed values. It is assumed that the m - ary 
exponentiation algorithm is used, but similar arguments apply to other classical 
(i.e. non-randomised) algorithms, such as sliding windows. 

The timing attack [21] assumes the use of a modular multiplication algorithm 
which includes a final, conditional subtraction of the modulus. We assume the 
consequent timing variations enable the attacker to record accurately almost all 
occurrences of these subtractions. He then observes a number of exponentiations 
for which the same, unblinded exponent is used. We demonstrate the attack 
using an implementation of Montgomery’s method which is described in the 
next section. 

Power use, and hence also EMR, varies with the amount of switching activity 
in a circuit. The average number of gates switched in a multiplier is close to linear 
in the sum of the Hamming weights of the inputs, and the same is true for the 
buses taking I/O to and from the multiplier. So, by employing a combination 
of power and EMR measurements from carefully positioned probes [5] , it should 
be assumed that an attacker can obtain some data, however minimal, which is 
related to the sum of the Hamming weights of these inputs. His problem is to 
combine these in a manner which reveals the Hamming weights with sufficient 
accuracy to deduce the digits of the exponent. The differential power analysis 
(DPA) attack [20] shows how this might be done from observations of a single 
exponentiation. Hence it does not matter if the exponent has been masked by 
the addition of, say, a 32-bit random multiple of [9]. 



Longer Keys May Facilitate Side Channel Attacks 



45 



3 Notation 

As above, we assume an n-bit modulus M and private exponent D for the 
RSA crypto-system. Ciphertext C has to be converted to plaintext C D mod M 
using a single, small fc-bit multiplier. Hence, except for the exponent, the n-bit 
numbers X involved in the exponentiation are represented using base r = 2 k 
and (non-redundant) digits x% (0 < * < s, say) in the range [0,r). Thus X = 
Xir * and , without loss of generality, n = ks. 

The exponent D is represented with a different base m, typically 2 or 
4, depending on the exponentiation algorithm. Exponentiation is usually per- 
formed using the binary “square-and-multiply” algorithm, processing the expo- 
nent bits in either order, or the generalisation of the most-to-least significant 
case, called m-ary exponentiation [7, 8], in which D is represented in radix m 
using, say, t digits, and some powers of = C l mod M (1 < i < m) which 
are pre-computed: 

The to-ary (Modular) Exponentiation Algorithm 

C ci) <- C ; 

For i <- 2 to m-1 do 

C Ci) <- C ci_1) X C mod M ; 

P <- C (dt -' ) ; 

For i < — t-2 downto 0 do 

Begin 

P 4- P m mod M ; 

If dj. ± 0 then P <- PxC Wi) mod M ; 

End ; 

Output: P = C D mod M for D = YllZo 

The modular products here are too large for the smartcard multiplier to per- 
form in one operation. Typically a form of Montgomery’s modular multiplication 
algorithm (MMM) is used [ L6] . This gives an output related to ( AxB ) mod M 
via a scaling factor R = r s which is determined by the number of digits in in- 
put A. The form of interest here includes a final conditional subtraction which 
reduces the output to less than M, but causes variation in the time taken. 

Montgomery’s Modular Multiplication Algorithm (MMM) 

P := 0 ; 

For i := 0 to s-1 do 

Begin 

P := P + a^xB ; 

qi := (-pomo -1 ) mod r ; 

P := (P + q-^xM) div r ; 

End ; 

If P >= M then P := P-M 

Output: P = ABr~ s mod M for A = a A l < Af and B < M. 
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Here mo’ 1 under the mod r is the unique residue modulo r with the property 
mo~ 1 x?7io = 1 mod r, i.e. the multiplicative inverse of m o mod r. Similarly, r~ s 
appearing under the mod M is the inverse of r s modulo M. The digit products 
such as Oj x B are generated over s cycles by using the fc-bit multiplier to compute 
each digit by digit product ai x bj for 0 < j < s from least to most significant digit 
of B , propagating carries on the way so that a non-red undant representation can 
be used. 

Using induction it is readily verified for R= r s that: 

Theorem 1. [22] The MMM loop has post- condition ABR~ 1 <P< ABR~ 1 +M. 

With the advent of timing attacks [9], the conditional subtractions should 
be avoided. This is easy to achieve by having extra loop iterations [6, 19, 22]. 
Alternatively, a non-destructive subtraction can always be performed if space is 
available, and the correct answer selected using the sign of the result. However, 
EMR measurements might still reveal this choice. 

4 The Timing Attack 

Walter and Thompson [21] observed that the final, conditional subtraction takes 
place in Montgomery’s algorithm with different frequencies for multiplications 
and squarings. Indeed, different exponent digits are also distinguished. It is as- 
sumed that the attacker can partition the traces of many exponentiations cor- 
rectly into sub-traces corresponding to each execution of MMM and use their 
timing differences to determine each instance of this final subtraction. This gives 
him a matrix Q = (ciij) in which is 1 or 0 according to whether or not there 
is an extra subtraction at the end of the ith modular multiplication of the jth 
exponentiation. We now estimate the distance between two rows of this matrix. 

With the possible exception of the first one or two instances, it is reasonable 
to assume that the I/O for each MMM within an exponentiation is uniformly 
distributed over the interval 0..M—1 since crypto-systems depend on multipli- 
cations performing what seem to be random mappings of multiplicands onto 
0..M— 1. Suppose 7 T mu is the probability that the final subtraction takes place 
in MMM for two independent, uniformly distributed inputs. Let A, B and Z be 
independent, uniformly distributed, discrete random variables over the interval 
of integers 0..M— 1 which correspond to the MMM inputs and the variation in 
output within the bounds given in Theorem 1. Then 7r mu = pr(Z+ABR~ 1 >M ) 

= ^Ez=oEa^I:b=o\Z+ABR^>M) = ^£a=o£b=o So 

Kmu ~ jMR 1 because M is large. 

On the other hand, suppose ir sq is the probability that the final subtraction 
takes place when MMM is used to square a uniformly distributed input. For A 
and Z as above, 7r sq = pr(Z+A 2 R~ 1 >M) = J2 a=o (^+A 2 R _1 > M) 

= jgr 4 2 R -1 , whence w sq « \MR~ l . 

The difference between 7r mu and 7r sg means that multiplications can be distin- 
guished from squares if a sufficiently large sample of exponentiations is available. 
n mu and Tr sq are given approximately by averaging the entries in the rows of <5- 
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If the binary or “square-and-multiply” exponentiation method is used, then the 
pattern of squares and multiplies given by these row averages reveals the bits of 
the secret exponent D. 

If the m-ary or sliding windows method is used for the exponentiation then it 
is also necessary to distinguish between multiplications corresponding to different 
exponent digits. This is done by using the fact that in the jth exponentiation, 
the same pre-computed multiplier C)- ' is used whenever the exponent digit is i. 

Let 7 Tij be the probability that the MMM input A = induces the conditional 
subtraction when the argument B is uniformly distributed on 0..M— 1. For Z 
as before, Try = pr(Z+cf BR~ l >M) = ^ EzJc, 1 Esij ){Z+Cf BR~ l >M) 

= jj C^BR- 1 = Cj * ^ R' 1 ■ The C { 'p are uniformly distributed as j 

varies. So the average value of 7Ty as j varies is, by definition, Tr mu . Also, the 
average value of 7ry 2 as j varies is 7r( 2 ) = -A J2c=o \C 2 R~ 2 ~ ^M 2 i? -2 . 

The distance between two rows of Q is defined here as the average Ham- 
ming distance between corresponding entries. This is, in a sense, independent 
of the sample size TV* , i.e. the number of columns. Thus the expected distance 
between two rows which correspond to the same exponent digit i is da = 
17 E j 7T ijO-~ 7T ij)- Ks average value is therefore d eq = da = 2{n mu —i r^ 2 -*) ss 
2 (jMR- 1 — j^M 2 R~ 2 ) = — gMi? -1 ), which is independent of N and, 

indeed, of i. 

Now assume that the distributions of C^' 1 and ^ are independent if i ^ . 

This is reasonable since the RSA crypto-system relies on the fact that application 
of a public exponent E= 3 to any ciphertext C should randomly permute values 
modulo M . Then, if two rows of Q correspond to distinct digits i and i ' , their 
distance apart is approximately da' = N HEj KijO--Ki'j) + Ej 
The average value of this is d neq = da> = 2(n mu —n rnu 2 ) ~ MR~ 1 (\—^MR~ l ). 

It is also possible to compare two squarings or a multiplication with a squar- 
ing. In an exponentiation, except perhaps for the first one or two squarings, the 
inputs to these would be independent. For a square and a multiplication involv- 
ing exponent digit i, the expected distance between the rows of Q is d sqi7nu = 
A r_1 (Ej 7Ty(l-7r sg ) + E j'KsqiX-Kij)). The average value of this is d sq , mu = 
Kmu+nsq— 2'Kmu^sq = M R~ l (yj — g M R~ l ). For two squares the expected dis- 
tance between the (different) rows of Q is d sqySq = 27V -1 Ej 7r sg (l— 7r sg ). The 
average value of this is d sq , sq = 27r sg (l— n sq ) = MR~ l { | — | MR- 1 ). 

Observe that d eq , d neq , d sq: mu and d sq ^ sq must all be distinct because M < R. 
As variance in these distances is proportional to -j^, the distance between two 
rows of Q will tend to one of these four distinct values as the sample size in- 
creases, making it easier to determine whether the rows represent, respectively, 
two multiplications corresponding to the same exponent digit, two multiplica- 
tions corresponding to different exponent digits, a squaring and a multiplication, 
or two squarings. 

It is easy to develop a suitable algorithm to traverse the rows of Q and classify 
all the multiplicative operations into subsets which represent either squarings or 
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the same exponent digit. One such algorithm was given in [20]. The classification 
might, for example, minimise the sum of the differences between the expected and 
actual distances between all pairs of rows. The set in which a pre-multiplication 
lies determines the exponent digit associated with that set. There are a few 
consistency checks which might highlight any errors, such as enforcing exactly 
one pre- multiplication in each set of multiplications, and squarings having to 
appear only in multiples of log 2 m consecutive operations. This enables the secret 
exponent key D to be reconstructed with sufficiently few errors to enable its 
precise determination providing the sample size N is large enough. 



5 Doubling the Key Length 

Suppose the key length is increased. Does the timing attack become more or less 
successful when the ratio M/R , the base m and the sample size N are kept the 
same? We will assume that the detection rate for the conditional subtraction 
is unchanged because the detection method is unspecified. However, it seems 
likely that the subtractions will be easier to spot for longer keys since, although 
the same detectable operations are performed in both cases, there are more of 
them. The detection assumption means that, by counting only the subtractions 
in each row of Q , the same proportion of errors will be made in classifying an 
operation as a square or a multiply. Doubling n will then double the number 
of such errors. However, using pairs of rows rather than single rows for this 
classification improves the likelihood of classifying multiplications correctly. 

First note that the distributions for the four types of distances between two 
rows are independent of n because the row length N is unchanged and the 
probability of a conditional subtraction is unchanged. Suppose the rows have 
already been roughly partitioned into one set for each non-zero exponent digit 
and one set for squares (m subsets in all). A row is classified, or its classification 
checked, by taking the distance between it and each of these sets. This distance 
is the average between the chosen row and each row of the group. Doubling n 
doubles the size of the group and so provides twice the number of distances from 
the row. So, as the other parameters are unchanged, the average distance from 
the row to the group will have half the variance when the key length is doubled. 
This will markedly reduce the probability of mis-classifying a row. 

There are two main types of error to consider, namely those which are de- 
tectable through inconsistencies and those which are not. Inconsistencies arise 
when squares do not appear in sequences of m or multiplications are not sepa- 
rated by squares. For convenience, suppose that the inconsistent errors can be 
corrected with computationally feasible effort for both key lengths n and 2 n be- 
cause this is a minor part of the total cost. Hence it is assumed that a possible 
pattern of squares and multiplies has been obtained. For simplicity, we assume 
this is the correct pattern. Ambiguities also appear when a multiplication ap- 
pears to be near to two or more different subsets of rows or near to none. The 
attacker then knows to check each of up to m possibilities for that multiplication. 
However, his main problem is the number of incorrect, but consistent decisions. 



Longer Keys May Facilitate Side Channel Attacks 



49 



A mis-classified row has to be too far away from its correct subset and too 
close to one other set in order to be assigned to an incorrect exponent digit. 
For convenience, suppose that average distances from one row to the subset 
of rows representing a given exponent digit are normally distributed, and that 
appropriate scaling is performed so that the distances are N( 0, 1). (Although the 
distances are bound within the interval [0,1], the previous section shows their 
averages are not close to either end, and so increasing the sample size N will 
improve the match with a normal distribution.) Let Z be a random variable with 
such a distribution, and let S be the (similarly scaled) distance at which the row 
is equally likely to be in the group as not in it. (A similar discussion is given 
in more detail in §7 for the other attack where these notions are made more 
precise.) Then the mis-classification occurs with probability pr(Z>6) 2 . Since S 
is inversely proportional to standard deviation it is dependent on the key length. 
Thus 5 = S n increases by a factor \/2 when the key length is doubled. 

Suppose p n is the probability of correctly classifying one multiplication. Since 
keys with 2 n bits require twice as many multiplications on average, we need p n < 
P 2 n 2 for the attack to become easier for the longer key. From the above, p n = 
l—pr(Z>5 n ) 2 where Z is N( 0, 1) and so the condition becomes 1— pr(Z>5) 2 < 
(l—pr(Z>\/25) 2 ) 2 . A quick glance at tables of the normal distribution shows 
that this is true providing 6 > 0.616. In other words, longer keys are easier to 
attack if distances between rows of Q are accurate enough. This just requires 
the sample size N to be large enough, or the experimental methods to be made 
accurate enough, or, indeed, n to be large enough. In conclusion, with all other 
aspects fixed there appears to be an optimal key length providing maximum 
security against this attack with shorter and longer keys being more unsafe. 

However, with more leaked data per exponent bit as key length increases, it is 
not impossible that the attack may be developed further so that there is no longer 
an optimal secure key length and all increases in key length become unsafe. For 
example, some exponentiations are more helpful than others in discriminating 
between one exponent digit and any others because the associated multiplicands 
are unusually large or unusually small. These instances become apparent while 
performing the attack just described. Then a weighted Hamming distance which 
favours these cases should improve the correct detection of the corresponding 
exponent digit. Increasing key length provides more useful data for detecting 
such cases, further decreasing the strength of longer keys. 



6 The Power Analysis Attack 

The other attack considered here is one based on data-dependent power and/or 
EMR variation from the smartcard multiplier [20]. The long integer multipli- 
cation AxB requires every product of digits a u xb v to be computed. For each 
index it, the power or EMR traces from the multiplier are averaged as v ranges 
over its s values. In general, the digits of B are sufficiently random for this aver- 
aging process to provide a trace which is reasonably indicative of the Hamming 
weight of a u . Concatenating these averaged traces for all a u provides a single 
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trace from which the vector of the Hamming weights of the digits of A is obtained 
with reasonable accuracy. Unless s is small, the Euclidean distance between the 
Hamming weight vectors for different, randomly chosen values of A has much 
smaller variance than its average. So this distance enables equal arguments A 
to be identified and unequal ones to be distinguished. By defining distance be- 
tween power traces via the Hamming weight vectors in this way, the attacker 
can expect to distinguish between, or identify, the multipliers C W used in the 
modular multiplications of an exponentiation. This enables the exponent digits 
to be discovered and hence the secret key D to be determined. 

In detail, each digit product a u xb v contributes |a u |+|&^|+a: to the trace- 
averaging process where |d| is the Hamming weight of digit d and x is an in- 
stance of a random variable X which represents measurement errors and varia- 
tion caused by the initial condition of the multiplier and other hardware. Without 
loss of generality, we assume nx=0- Averaging over the s digits of B provides 
ilHl + laul+aTs as the itth co-ordinate of the vector for AxB. Here the aver- 
age for x s is the same as for X (namely 0) but, with realistic independence 
assumptions, the variance is less by a factor s. As the s digits have k bits 
each and their Hamming weights are binomially distributed, this has a mean 
of ^+|a u | and variance j-+-(Tx- Thus, overall, the coordinates have mean k 
and variance |(l+i)4 -\o\. Now, comparing the vectors from two independent 
multipliers and \ the mean square difference in the rtth co-ordinate 
is (l+-)+-Ojf , leading to a mean Euclidean distance between the vectors of 

y/ ^(s+l)+2a\. However, if the multipliers are equal, i.e. i = i ’ , then the Eu- 
clidean distance between the two vectors contains no contribution from 
and C^'k So its mean is derived entirely from the variance ^ + - <j' 2 x in each 

co-ordinate, namely ^ j |+2 a\. Hence there is a y/ s+ 1-fold difference in size be- 
tween the distances between vectors for the same and for different multiplicands 
when the data is “clean” . Other variation from measurement error and hardware 
initialisation is only significant for small s, which is not the case here. 

These vectors are used to form a matrix Q = iq-ij) similar to that in the tim- 
ing attack: qij is the weight derived from the jth digit of the ith multiplication. 
As before, the distances (now Euclidean) between rows are used to distinguish 
squares from multiplies, and identify rows corresponding to the same exponent 
digits. Squares have no arguments in common with other operations, so that 
they have distances from all other rows which behave in the same way as de- 
scribed above for distances between multiplications for different exponent digits; 
they are not close to any other rows in the way that rows are for multiplica- 
tions associated with the same exponent digit. Thus, as before, the attacker can 
potentially determine the secret key D. 

7 Increasing the Key Length 

The formulae in the previous section make explicit some of the results of in- 
creasing the key length n, and hence also the number of digits s. First, the trace 
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averaging process is improved for individual entries in Q by reducing their vari- 
ance. Secondly, as a consequence, the larger number of columns in Q increases 
the Euclidean distance between rows corresponding to different exponent digits 
without changing the distance between rows corresponding to the same digit. 
This enables each row to be classified more accurately. Thirdly, as in the tim- 
ing attack, the larger number of rows in Q reduces the variance in the average 
distance of the row from the sets of rows which represent the various exponent 
digits. This, too, enables rows to be classified more accurately. So, as well as the 
increased difference in average separation, doubling the key length halves the 
variance in the separation since the sets for each exponent digit contain twice as 
many rows on average. At a theoretical level, this more than squares the proba- 
bility of mis-classifying a digit so that the total number of digit errors actually 
decreases when key length increases. The question, as before, is whether such 
improved accuracy in classification really does achieve this in practice. 

Modelling the attack by randomly generating Hamming weights is potent- 
ially inaccurate for several reasons. For example, bus encryption should hide 
Hamming weight effectively. Secondly, the multiplier does not necessarily yield 
the Hamming weight of inputs with much accuracy. Lastly, the multiplier does 
not operate independently on different input digits: processing one digit of a long 
integer input sets the multiplier in a biased state which affects the amount of 
switching when the next digit is processed. 

So it was decided to assume the attacker made observations of EMR from, 
and power use by, the multiplier which would enable him to estimate the number 
of gates being switched in the multiplier. A model was built of the typical gate 
layout in a 2 fc -bit multiplier using full adders and half adders in a Wallace tree 
without Booth re-coding. Random long integers were generated, and their dig- 
its fed sequentially into the multiplier as in a long integer multiplication. Gate 
switching activity was counted for each clock cycle, and the averaging and con- 
catenation processes of the previous section were used to generate a row of the 
matrix Q. In this way m-ary exponentiation was modelled and a large number 
of values obtained for Q. Key length was varied to obtain an insight into general 
behaviour and confirm the improved feasibility of the attack as n increases. 

Figures from the simulation of 8-ary exponentiation with a standard 32-bit 
multiplier and various key lengths are given in Table 1. This is the largest multi- 



Table 1. Gate Switch Statistics for 32-bit Multiplier with m — 8 



Bit length n 


32 


64 


128 


256 


512 


1024 


2048 


Av btwn same 


266 


255 


234 


201 


177 


176 


171 


SD btwn same 


191 


161 


137 


129 


106 


110 


100 


Av min to diff 


68.4 


146 


324 


434 


843 


1453 


2153 


SD min to diff 


53.4 


87.9 


78 


102 


140 


131 


118 


%age errors 


83 


71 


16 


1.8 


0.02 


0.00 


0.00 


SDs btwn avs 


- 


- 


0.84 


2.02 


5.41 


10.6 


18.2 


p c (lowr. bnd.) 


- 


- 


0.439 


0.711 


0.9932 


0.999... 


0.999... 
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plier likely to be found in current smartcards. Standard m - ary exponentiation 
was used, not the sliding windows version, so that there were m— 1 pre-computed 
powers for use in multiplications. These choices of k , m and algorithm make the 
correct association of exponent digits more difficult than is typically the case. So 
the setting here provides one of the most unfavourable scenarios for the attack, 
except for the absence of measurement noise. Moreover, the refinement of making 
comparisons between every pair of rows was neglected: just making comparisons 
of a row with each of the pre-computed multiplications was enough to establish 
the principle that longer key lengths may be less secure. 

The column headings in Table 1 provide the key length: the number of bits n 
used in both the modulus M and the exponent D. Values are chosen to illustrate 
the effect of doubling key length. Although the smaller values are totally insecure, 
they allow a much clearer appreciation of the overall trends, especially in the 
last three rows of the table where values tend to their limits very quickly. The 
first line of data provides the average distance between vectors formed from 
multiplications which correspond to equal, exponent digits. These confirm the 
theory of the previous section: for equal digits, the average distance is essentially 
constant except for a slight decline arising from reduced noise as key length 
increases. The second line records the standard deviation in the figures of the 
first line. These also show a steady decrease as n increases. They are consistently 
about two thirds of the associated means over the given range. 

The third and fourth lines provide the average distance, and its standard devi- 
ation, of a multiplication from the nearest pre-computed case which corresponds 
to a different exponent digit. If exponent digits are assigned to multiplications 
on the basis of the nearest pre-computation trace, these lines give a basis for 
estimating the number of errors that would be made. The percentage of such 
errors encountered in the simulations is given in the following line. For the very 
smallest cases, the nearest pre-computation vector is, on average, nearer than 
that corresponding to the same exponent digit. So a large number of errors are 
made. For the 128-bit or larger keys that are encountered in practice, the near- 
est pre-computation is usually the correct one. Due to its marginally different 
definition, the average tabulated in line 3 behaves slightly differently from the 
average between any pair of multiplications corresponding to different exponent 
digits. However, as with the formula in the previous section, this distance in- 
creases markedly with the key length, so that multiplications corresponding to 
different exponent digits are distinguished more easily. The standard deviations 
in line 4 varied noticeably between different samples generated by the multiplier 
simulation even for large samples (with the largest s.d. being around 50% greater 
than the smallest), but there seems to be a gradual trend upwards. 

Both lines of standard deviations are included in order to calculate how many 
distances might be incorrectly classified as corresponding to equal or unequal dig- 
its. The average was taken of the two standard deviations in each column and the 
number of them which separate the two averages in the column was computed. 
This is tabulated in the second last line. The final line of the table contains an 
estimate from normal distribution tables for the probability p c that the nearest 
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trace of a pre-computation correctly determines the exponent digit associated 
with a multiplication, given that the operation is indeed a multiplication rather 
than a squaring. This assumes that the distances between traces sharing the 
same multiplicand are normally distributed with the tabulated expectation and 
variance and, similarly, that the minimum distance between one trace and a set 
of m — 2 traces, all with different multiplicands, is normally distributed with the 
given expectation and variance. 

Thus, let Z c be a random variable with distribution iV(/r c ,cr c 2 ) which gives 
the distance to the correct trace, and let Zd be a random variable with distribu- 
tion N(ndi Cd 2 ) which gives the distance to the nearest incorrect trace (one for 
a different digit). Then the probability of a correct decision is p c ss Pr(Z c < Zd). 
Since, by the table, the means are so many standard deviations apart for typical 
values of n, a good approximation is given by 



Pr(Z c < Zd) 



Pr Z c < 



^cMd+q-dMc 



^ p r I a cdd+&ddc 
\ O-c+CTd 




This yields p c ~ Pr(Z < ) 2 where Z is an N(0, 1) random variable. The 

last line of the table gives these values direct from tables of the normal distribu- 
tion. This is approximately the probability of identifying the correct exponent 
digit given that the operation is a multiplication. It is consistent with the ob- 
served number of errors recorded in the table. 

These probabilities goes up much faster than the square root as key length 
is doubled: pi 128 ^ = 0.4386, pi 256 ^ = 0.7114, p£ 512 ^ = 0.9932 and p£ 4 ° 24 ^ = 1 
— 2.5 xlO^ 7 easily satisfy pi 128 ^ < {pc 2 ° 6) } 2 , p? 56 ^ < {pi 512 ^} 2 and pi 512 ' 1 < 
{pc 10 " 4 ’} 2 - This means that it is easier to identify the exponent digits correctly 
for a pair of multiplications where the key has 2 n bits than it is to identify the 
exponent digit correctly for a single multiplication where the key has only n bits. 
Thus fewer errors will be made for the 2n-bit key. Indeed, the total number of 
predicted errors decreases rapidly towards zero over the range of the table. Thus, 
since squares are detected in a very similar way (they are not close to any of the 
pre-computed powers) , at least in the simulation it becomes easier to deduce the 
full secret exponent as key length increases. 

The analysis above has not taken into account comparisons between all possi- 
ble pairs of product traces - distances between pairs of computation stage opera- 
tions can be evaluated as well. As noted earlier, each row of Q can be compared 
with 0(n ) other rows instead of just 0(1) rows. This decreases the variances 
involved and thereby improves the above decision process as n increases. Hence, 
as key length increases, a full scale attack will become even easier to process 
correctly than is indicated by the table. 



8 A Particular Example 

Consider the case from Table 1 which is closest to that in a typical current smart- 
card, namely a key length of 1024 bits. This will require about 1023 squares, 
which will occur in 341 triplets for m = 2 3 , and about 7/8x1023/3 ss 298 multi- 
plications. By examining the exponent from left to right, of all the squares it is 
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necessary to identify only the first of each triplet correctly. Hence there are ap- 
proximately 341 + 298 = 639 operations to identify as squares or multiplications, 
after which each multiplication must be associated with a digit value. 

Let n sq and p, mu be the average distances of a square and multiplication 
from the nearest of the m—1 pre-computation traces and let rr sq and a mu be the 
corresponding standard deviations. A multiplication is assumed if, and only if, 
the distance is less than . The probability p sm of this being the 

correct decision is then Pr(Z < ) for an 7V(0,1) random variable Z. 

For larger m as here, p mu ~ p c and p sq ss p d so that p sm ~ fyjfy So all 
the squares and multiplications are identified correctly with probability at least 
Psq 639 ~ Pc 319 ' 5 - Correct determination of the exponent digits for the 298 or so 
multiplications is done with probability about p c 298 . Hence, without making any 
use of the considerable data given by comparing the (1023+298) 2 /2 or so pairs 
of computation phase traces, deduction of the correct exponent will occur with 
probability at least about Pc 2 ^+^-5 ~ (1-2.5 xlO" 7 ) 6175 « 0.9998. 

At least theoretically, this very high probability should give cause for concern. 
In practice, it is to be hoped that measurement noise will substantially reduce 
the ability to identify the correct multiplicand C^. Even a modest reduction 
in the probabilities p mu and p sq would be helpful since both must be raised to 
a power linear in the number of bits in the key in order to obtain the probability 
that the key is identified correctly first time without further computing to try 
the next best alternatives. It is computationally feasible to correct only one or 
two errors. 

9 Counter-Measures 

Counter-measures for the timing attack are straightforward, and were covered in 
the introduction: the exponent can be randomly blinded [9] and the subtraction 
can either be performed every time or be omitted every time [6, 19, 22]. 

The power analysis attack is harder to perform, but also harder to defeat. 
Exponent blinding is not a defence. The attack does rely on re-use of the pre- 
computed powers. Hence performing m-ary exponentiation in the opposite order, 
namely from least to most significant digit, may be a solution. This can be done 
without significant extra computation time [15, 23]. For m = 2, the normal 
square-and-multiply algorithm can be modified to square-and-always-multiply, 
but this is more expensive time- wise. 

Apart from hardware counter-measures such as a Faraday cage to shield 
the processor and capacitors to smooth out the power trace, a larger multiplier 
also helps. This reduces the number of digits s over which averages are taken 
and reduces the number s of concatenated traces. Thus it reverses the effect of 
increased key length on the traces. Moreover, with larger numbers of words shar- 
ing the same Hamming weight, it becomes less easy to use the Euclidean metric 
to separate the different multiplicands. Further, one might use two multipliers 
in parallel. Montgomery’s modular multiplication algorithm naturally uses two. 
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Then the power used by one might successfully shield observation of the power 
used by the other. Thus safety can be bought, but perhaps only at a price. 

10 Conclusion 

Two attacks on smartcard implementations of RSA have been outlined, one 
a timing attack and the other a power analysis attack. In each case the effect 
of increasing the key length was studied for its impact on the number of bit 
errors made in deducing the secret key. For the timing attack, leaked data is 
proportional to the square of the key length and it appears that there is an 
optimal secure length with both shorter and longer keys being less safe. For the 
power analysis attack, leaked data is proportional to the cube of the key length 
and the analysis shows that longer keys are less safe. 

There are a number of both algorithmic and hardware counter-measures 
which improve resistance against such side channel attacks and they should pro- 
vide the extra safety that one has been traditionally led to expect from longer key 
lengths. However, the main conclusion is that counter-measures to cryptanalysis 
must not be assumed to be suitable as counter-measures to side channel leakage. 
In particular, increasing key length on its own appears to be quite unsuitable as 
a counter-measure in embedded RSA cryptosystems. 
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Abstract. Differential power analysis (DPA) attacks can be of major 
concern when applied to cryptosystems that are embedded into small 
devices such as smart cards. To immunize elliptic curve cryptosystems 
(ECCs) against DPA attacks, recently several countermeasures have been 
proposed. A class of countermeasures is based on randomizing the paths 
taken by the scalar multiplication algorithm throughout its execution 
which also implies generating a random binary signed-digit (BSD) rep- 
resentation of the scalar. This scalar is an integer and is the secret key of 
the cryptosystem. In this paper, we investigate issues related to the BSD 
representation of an integer such as the average and the exact number of 
these representations, and integers with maximum number of BSD repre- 
sentations within a specific range. This knowledge helps a cryptographer 
to choose a key that provides better resistance against DPA attacks. 
Here, we also present an algorithm that generates a random BSD repre- 
sentation of an integer starting from the most significant signed bit. We 
also present another algorithm that generates all existing BSD represen- 
tations of an integer to investigate the relation between increasing the 
number of bits in which an integer is represented and the increase in the 
number of its BSD representations. 

Keywords: Differential power analysis, elliptic curve cryptosystems, bi- 
nary signed-digit representation, scalar multiplication, smart cards. 



1 Introduction 

Smart cards and wireless devices are vulnerable to a special type of attacks, 
those are side-channel attacks. For these systems, a correct implementation of 
a strong protocol is not necessarily secure if this implementation does not take 
into account the leakage of secret key information through side channels. Exam- 
ples of side channels are execution time [8], computational faults [1] and power 
consumption [9, 12, 13]. The mentioned systems are vulnerable since the task 
of monitoring the power consumption or the execution time of cryptographic 
protocols running on them is relatively easy. 

Power Analysis attacks are those based on monitoring and analyzing the 
power consumption of devices while executing a cryptographic algorithm in or- 
der to obtain significant information about the secret key. Kocher et al. were the 



M. Matsui and R. Zuccherato (Eds.): SAC 2003, LNCS 3006, pp. 58—72, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 



On Randomizing Private Keys to Counteract DPA Attacks 



59 



first to present attacks based on simple and differential power analysis (referred 
to as SPA and DPA respectively). In SPA, a single power trace of a cryptographic 
execution is measured and analyzed to identify large features of a cryptographic 
algorithm and specific operations that are typical of the underlying cryptosys- 
tem. On the other hand, to mount a DPA attack, an attacker collects hundreds of 
power signal traces, and manipulates them with statistical techniques to extract 
the differential signal. Therefore, DPA is in general more powerful than SPA. 

Coron [3] has explained how power analysis attacks can be mounted on ECCs 
- which are more suitable for memory limited devices because of the small size of 
their keys - and suggested countermeasures against both types of attacks. Other 
authors have proposed DPA countermeasures for both general and specific types 
of elliptic curves [4, 5, 6, 10, 15, 16]. Specifically, the approach followed by the 
authors in [4, 16] was based each on randomizing the number and the sequence 
of execution of operations in the scalar multiplication algorithm. This random- 
ization consists of inserting a random decision in the process of building the 
representation of the scalar k. The algorithms to which this randomization is 
applied were originally proposed to speed up the elliptic curve (EC) scalar mul- 
tiplication. They are based on replacing the binary representation of the scalar k 
by another representation with a fewer number of nonzero symbols by allowing 
negative symbols to be inserted (e.g. binary signed-digit (BSD) representation 
of an integer). This yields a fewer number of executions of the point addition (or 
subtraction) in the EC scalar multiplication algorithm. This speedup was possi- 
ble using the fact that the negative of a point on an elliptic curve is available at 
no extra computational cost. The algorithms proposed were computing what we 
referred to as the canonical BSD (also known as the sparse or NAF) represen- 
tation of the integer k by scanning the bits of its binary representation starting 
from the least significant bit. A brief overview of the EC scalar multiplication 
and the possible power attacks on it is presented in Sect. 2. 

When we consider the DPA countermeasures based on randomizing the BSD 
representation of the integer k some natural questions arise, those are: What is 
the average number of BSD representations for an integer k represented in n 
bits, where the BSD representations are to be of length l signed bits - which we 
will refer to as sbits - for both cases l = n and l = n+1? For integers represented 
in n bits, which one has the maximum number of BSD representations since this 
may affect the choice of the secret scalar for cryptosystems that will adopt this 
countermeasure? In this paper, we are interested in answering those questions 
and the answers are presented in Sect. 3. Also, in the same section, we present 
an algorithm that calculates the exact number of those representations for any 
integer k in 0(n). 

In Sect. 4, we present an algorithm that generates a random BSD represen- 
tation for an integer k by scanning its bits starting from the most significant bit 
in 0(n) and another algorithm that generates all BSD representations for an inte- 
ger k in 0(3 LfJ ) in the worst case. We also demonstrate the effect of increasing n 
on the number of BSD representations of k. Many of the proofs and examples 
have been omitted due to the space limitations. For a complete version of this ar- 
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tide, we refer the reader to the corresponding CACR technical report published 
at http : //www . cacr .math. uwater loo . ca/techreports/2003/corr2003-ll .ps 



2 EC Scalar Multiplication and Power Analysis Attacks 

In the context of ECCs, the EC scalar multiplication is the core cryptographic 
operation performed. This operation computes Q = kP , i.e. the addition of P to 
itself k times, where Q and P are two points on a predefined elliptic curve over 
a finite field and are public, and k is a secret integer. The EC scalar multiplication 
is conventionally performed using the double- and- add algorithms which are also 
referred to as the binary algorithms [7, Sect. 4.6.3]. The doubling of a point and 
the addition of two points on an elliptic curve are operations performed using 
basic finite field operation as explained in [11]. 

The side channel attacks on the EC scalar multiplication aim to discover the 
value of k. We assume that the attacker gets hold of the cryptographic token per- 
forming the EC scalar multiplication. Coron [3] has explained the possible SPA 
and DPA attacks that can be mounted on this token. He suggested a reasonable 
SPA countermeasure at the expense of the execution time of the algorithm. We 
will focus here on DPA countermeasures. 

To mount a DPA attack, the attacker inputs to the token a large number of 
points and collects the corresponding power traces. As Coron [3] and Hasan [5] 
have explained, knowing the input point to the algorithm, the attacker starts 
by guessing the bits of the scalar k starting by the first bit involved in the 
computation, and accordingly computes the intermediate value for each point 
he inputs to the algorithm after the bit considered is processed. Based on the 
representation of the intermediate results, he chooses some partitioning function 
to partition the power traces he collects and processes them with statistical 
techniques, such as averaging, to conclude the real value of the bit considered. 
The attacker then repeats this procedure with every bit of the scalar. 

Many of the countermeasures proposed to defeat DPA where based mainly 
on some form of randomization. They suggested either randomizing (or blinding ) 
the point P or the scalar k. For example, one of the countermeasures proposed by 
Coron [3] suggested randomizing the value of k modulo the order of the point P, 
this is done as the first step in the scalar multiplication algorithm. Another type 
of countermeasures, that is of interest to us, is based on modifying the repre- 
sentation of the scalar k throughout the execution of the scalar multiplication 
algorithm. The representation of k in [4, 16] on which this approach is applied 
is the BSD representation. The underlying idea is as follows. The right-to-left 
EC scalar multiplication can be speed up by computing the NAF of k along 
with performing the corresponding doubling and addition (or subtraction) of 
points [ 4] . Random decisions can be inserted in the NAF generating algorithms 
so that they do not generate the NAF of k but any of the possible BSD repre- 
sentations of k (including the NAF). A different representation for the scalar k 
is chosen, and hence a different sequence of EC operations is followed, every 
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time the algorithm is executed. We refer the reader to the paper by Oswald and 
Aigner [16] and the one by Ha and Moon [4] for the details since they are not 
relevant to the work presented here. 

3 Number of Binary Signed Digit Representations 

Considering the binary representation of k as one of its BSD representations, 
different BSD representations for k can be obtained by replacing 01 with 11 and 
vice versa and by replacing 01 with 11 and vice versa [17, equations 8.4.4- and 
8.4.5-]. The binary representation of k, must include at least one 0 that precedes 
a 1, so that starting from it we can obtain other BSD representations. 

We consider in this case positive integers k, i.e. 0 < k < 2 n . In general, the 
integers x , represented in l = n sbits in BSD system, would be in the range 
—2 < x < 2 l . There are 3 ; different combinations for x. Also, in this system, for 
every positive integer corresponds a negative one obtained by exchanging the Is 
with Is and vice versa. Hence, the total number of non-negative combinations is 
A±l. This result can be confirmed using the lemmas below. 



3.1 Useful Lemmas 

In this subsection, we present a number of lemmas related to the number of BSD 
representations of an integer k. These lemmas will be used to derive the main 
results of this article. 

Let A (k, n) be the number of BSD representations of k for 0 < k < 2 n that 
are l = n sbits long. Then the following lemmas hold. 

Lemma 1. (i) A(0,n) = 1, (ii) A(l,n) = n, (iii) A(2 l ,n) = n — i. 

Lemma 2. For 2 n ~ 1 < k < 2" — 1, A (k, n) = A (k — 2 n ~ 1 , n — 1). 

Lemma 3. Fork even, X(k, n) = A(|, n — 1). 

Lemma 4. For k odd, 

A (k, n) = A (k — 1, n) + A (k + 1, n) , 



or 



A (fc, n) = A 



,n 1 ) -\- A 



+ 1,71 — 1 



3.2 Number of BSD Representations of Length l = n + 1 

The algorithms that we are concerned with as DPA countermeasures are based 
on algorithms that generate the NAF representation of an integer. Since the 
NAF of an integer may be one sbit longer than its binary representation, we are 
interested in knowing the number of BSD representations of an integer k that 
are l sbits long, where in this case l = n + 1. 



